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Description 

The present invention relates to viruses ca- 
pable of inducing lymphadenopathies (denoted be- 
low by the abbreviation LAS) acquired Immuno- 
depressive syndromes (denoted below by the ab- 
breviation AIDS), to antigens of said viruses, par- 
ticularly in a purified form, and to processes for 
producing these antigens, particularly antigens of 
the envelopes of these viruses. The invention also 
relates to polypeptides, whether glycosylated or 
not, encoded by said DNA sequences. 

The invention also relates to cloned DNA se- 
quences hybridizable to genomic RNA and DNA of 
the new lymphadenopathy associated viruses (LAV) 
disclosed hereafter, to processes for their prepara- 
tion and their uses. It relates more particularly to 
stable probes including a DNA sequence which can 
be used for the detection of the new LAV viruses or 
related viruses or DNA proviruses in any medium, 
particularly biological, samples, containing of any 
them. 

An Important genetic polymorphism has been 
recognized for the human retrovirus at the origin of 
the acquired immune deficiency syndrome (AIDS) 
and other diseases, like lymphadenopathy syn- 
drome (LAS), AIDS-related complex (ARC) and 
probably some encephalopathies (for review see 
Weiss, 1984). Indeed all of the isolates analyzed 
until now have a distinct restriction map, even if 
recovered from the same place and time (BENN et 
al., 1985). Identical restriction maps have only been 
observed for the first two isolates designated 
lymphadenopathy-assoclated virus, LAV (ALIZON 
et al., 1984) and human T-cell lymphotropic virus 
type 3, HTLV-3 (HAHN et a!.. 1984) and thus 
appears as an exception. The genetic polymor- 
phism of the AIDS virus was better assessed after 
the determination of the complete nucleotide se- 
quence of LAV (WAIN-HOBSON et al., 1985), 
HTLV-3 (RATNER et al., 1985 ; MUESING et al., 
1985) and of a third isolate designated AIDS-asso- 
ciated retrovirus, ARV (SANCHE2-PESCAD0R et 
al., 1985). In particular it appeared that, besides the 
nucleic acid variations responsible for the restric- 
tion map polymorphism, Isolates could differ signifi- 
cantly at the protein level, especially In the en- 
velope (up to 13 % of difference between ARV and 
LAV), by both amino-acids substitutions and recip- 
rocal Insertions-deletions (RABSON and MARTIN, 
1985). 

Nevertheless the differences mentioned above 
do not go as far as to destroy a level of im- 
munological relationship sufficient, as evidenced by 
the capabilities of similar proteins, i. e. core pro- 
teins of similar nature, such as the p25 proteins, or 
of similar envelope glycoproteins, such as the 110- 
120 kD glycoproteins, to immunologically cross- 



react. Accordingly the proteins of any of said LAV 
viruses can be used for the in vitro detection of 
antibodies induced in vivo and present in biological 
fluids obtained from individuals infected with the 

5 other LAV variants. Therefore these viruses are 
grouped in a class of LAV viruses, hereafter gen- 
erally said to belong to the class of LAV-1 viruses. 

The invention stems from the discovery of new 
viruses which although held as responsible of dis- 

70 eases which are clinically related to AIDS and still 
belonging to the class of "LAV-1 viruses", differ 
genetically to a much larger extent from the above 
mentioned l-AV variants. 

The new viruses are basically characterized by 

75 the DNA sequences which are shown In Figures 7A 
to 7J (LAVeli) and figures 8A to 81 (LAVMal) re- 
spectively. 

The invention further relates to variants of the 
new viruses the RNAs of which or the related 
20 cDNAs derived from said RNAs are hybridizable to 
corresponding parts of the cDNAs of either LAVeu 
or LAVmal. 

The Invention also relates to the DNAs them- 
selves of said viruses, hybridizable with the 

25 genomic RNA of either LAVeu or LAVmal- Particu- 
larly said DNAs consist of said cDNAs or of recom- 
binant DNAs containing said cDNAs. 

It further relates to DNA recombinants contain- 
ing DNAs of either LAVeli or LAVmal or of related 

30 viruses. It is of course understood that DNAs which 
would include some deletions or mutations which 
would not substantially alter their capability of also 
hybridizing with the retroviral genomes of LAVeu or 
LAVmal are to be considered as forming obvious 

35 equivalents of the DNAs more specifically referred 
to hereabove. 

The invention also relates more specifically to 
cloned probes which can be made starting from 
any DNAs according to the Invention, thus to re- 

40 combinant DNAs containing such DNAs, particu- 
larly any plasmids amplifiable in procaryotic or 
eucaryotic cells and carrying said DNAs. 

Using the cloned DNA containing a DNA of 
LAVeu or of LAVmal as a molecular hybridization 

45 probe - either by marking with radionucleotides or 
with fluorescent reagents - LAV virion RNA may be 
detected directly e.g. in the blood, body fluids and 
blood products (e.g. of the antihemophilic factors 
such as Factor VIII concentrates). A suitable meth- 

50 od for achieving that detection comprises immo- 
bilizing virus onto a support e.g. nitrocellulose fil- 
ters, etc., disrupting the virion and hybridizing with 
labelled (radiolabelled or "cold" fluorescent- or 
enzyme-labelled) probes. Such an approach has 

55 already been developed for Hepatitis B virus in 
peripheral blood (according to SCOTTO J. et al. 
Hepatology (1983), 3, 379-384). 

Probes according to the invention can also be 
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used for rapid screening of genomic DNA derived 
from the tissue of patients with l^V related symp- 
toms, to see if the proviral DNA or RNA present in 
host tissue and other tissues are related to LAVeu 
or LAVmal. 

A method which can be used for such screen- 
ing comprise the following steps : extraction of 
DNA from tissue, restriction enzyme cleavage of 
said DNA, electrophoresis of the fragments and 
Southern blotting of genomic DNA from tissues, 
subsequent hybridization with labelled cloned LAV 
provivat DNA. Hybridization in situ can also be 
used. 

Lymphatic fluids and tissues and other non- 
lymphatic tissues of humans, primates and other 
mammalian species can also be screened to see if 
other evolutionnary related retrovirus exist. The 
methods referred to hereabove can be used, al- 
though hybridization and washings would be done 
under non stringent conditions. 

The DNA according to the invention can be 
used also for achieving the expression of LAV viral 
antigens for diagnostic purposes as well as far the 
production of a vaccine against LAV. Fragments of 
particular advantage in that respect will be dis- 
cussed later. 

The methods which can be used are multifold : 

a) DNA can be transfected into mammalian cells 
with appropriate selection markers by a variety 
of techniques, calcium phosphate precipitation, 
polyethylene glycol, protoplast-fusion, etc... 

b) DNA fragments corresponding to genes can 
be cloned into expression vectors for E. coli . 
yeast- or mammalian cells and the resultant 
proteins purified. 

c) The provival DNA can be "shot-gunned" 
(fragmented) into procaryotic expression vectors 
to generate fusion polypeptides. Recombinant 
producting antlgenically competent fusion pro- 
teins can be identified by simply screening the 
recombinants with antibodies against LAVeli or 
LAVmal antigens. 

Particular reference in that respect is made to 
those portions of the genomas of LAVeu and 
LAVmal which, in the drawings, are shown to be- 
long to open reading frames and which encode the 
products having the polypeptidic backbones 
shown. 

More particularly, the invention relates to the 
different polypeptides which appear in figures 7A to 
81. Methods disclosed in European application 0 
178 978 and in PCT application PCT/EP 85/00548 
filed on Oct.18. 1985 are applicable for the produc- 
tion of such peptides from the corresponding 
viruses. 

The present invention further aims at providing 
polypeptides containing sequences in common 
with polypeptides comprising antigenic determi- 



nants included in the proteins encoded and ex- 
pressed by the LAVeu or of LAVmal genome. An 
additional object of the invention is to further pro- 
vide means for the detection of proteins related to 

5 the LAV viruses, particularly for the diagnosis of 
AIDS or pre-AIDS or, to the contrary, for the detec- 
tion of antibodies against the LAV virus or proteins 
related therewith, particularly in patients afflicted 
with AIDS or pre-AIDS or more generally in asym- 

10 tomatic carriers and in blood-related products. R- 
nally the invention also aims at providing im- 
munogenic polypeptides, and more particularly 
protective polypeptides for use in the preparation 
of vaccine compositions against AIDS or related 

75 syndroms. 

The invention relates also to polypeptide frag- 
ments having lower molecular weights and having 
peptide sequences or fragments in common with 
those shown in figures 7A to 81. Fragments of 

20 smaller sizes may be obtained by resorting to 
known techniques. For instance such a method 
comprises cleaving the original larger polypeptide 
by enzymes capable of cleaving it at specific sites. 
By way of examples of such proteins, may be 

25 mentioned the enzyme of Staphylococcyus aureus 
V8, a-chymotrypsine. "mouse sub-maxlllary gland 
protease" marketed by the BEOHRINGER com- 
pany, Vibrio alginolyticus chemovar iophagus col- 
lagenase. which specifically recognizes said pep- 

30 tides Gly-Pro and Gly-Ala, etc. 

Other features of this invention will appear In 
the following disclosure of the data obtained start- 
ing from LAVeli and LAVmal. in relation to the 
drawings in which : 

35 - Figs 1A and IB provide restriction maps of 
the genomas of LAVeu and LAVmal as com- 
pared to LAVbru (a known LAV isolate depos- 
ited at CNCM under number 1-232 on July 
15th, 1983) : 

40 - Fig. 2 shows the comparative maps setting 
forth the relative positions of the open read- 
ing frames of the above genomas ; 

- Figs. 3A-3F (sometimes also designated glo- 
bally hereafter by fig. 3) indicate the relative 

45 correspondance between the proteins (or 

glycoproteins) encoded by the open reading 
frames, whereby aminoacid residues of pro- 
tein sequences of LAVeli and LAVmal are in 
vertical alinment with corresponding 

50 aminoacid residues (numbered) of corre- 

sponding or homologous proteins or 
glycoproteins of LAVbru ; 

- Figs. 4A-4B (sometimes also designated glo- 
bally hereafter by fig. 4) provide for quan- 

55 titatlon of the sequence divergence between 

homologous proteins of LAVbru. LAVeli and 
LAVmal : 

- Fig. 5 shows diagrammatically the degree of 
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divergence of the different virus enveloppe 
proteins ; 

- Figs, 6A and 6B (or Fig. 6 when viewed 
altogether) render apparent the direct repeats 
which appear in the proteins of the different 
AIDS virus isolates. 

- Figs. 7A-7J and 8A-8I show the full 
nucleotidic sequences of LAVeu and LAVmal 
respectively. 

RESULTS 

Characterization and molecular cloning of two 
African isolates. 

The different AIDS virus isolates concerned are 
designated by three letters of the patients name, 
LAVbru refeRring to the prototype AIDS virus Iso- 
lated in 1983 from a French homosexual patient 
with LAS and thought to have been infected in USA 
in the preceding years (Barr^-Sinoussi et al., 1983). 
Both of the African patients originated from Zaire ; 
LAVeu was recovered in 1983 from a 24 year old 
woman with AIDS, and LAVmal in 1985 from a 7 
year old boy with ARC, probably infected in 1981 
after a blood-transfusion in Zaire, since his parents 
were LAV-seronegatlve. 

Recovery and purification of each of the two 
viruses were performed according to the method 
disclosed in European Patent Application 84 
401834/138 667 filed on September 9, 1984. 

LAVeli and LAVmal are indistinguishable from 
the previously characterized isolates by their struc- 
tural and biological properties in vitro. Virus meta- 
bolic labelling and immune precTpitation by patients 
ELI and MAL sera, as well as reference sera, 
showed that the proteins of LAVeli and LAVmal had 
the same molecular weight (MW) and cross-reac- 
ted immunologically with those of prototype AIDS 
virus (data not shown) of the "LAV 1 " class. 

Reference is again made to European Applica- . 
tion 178 978 and International Application PCT/EP 
85/00548 as concerns the purification, mapping 
and sequencing procedures used herein. See also 
"experimental procedures" and "legends of the 
figures" hereafter. 

Primary restriction enzyme analysis of LAVeu 
and LAVmal genomes was done by southern blot 
with total DNA derived from acutely infected lym- 
phocytes, using cloned LAVbru complete genome 
as probe. Overall cross-hybridization was observed 
under stringent conditions, but the restriction pro- 
files of the<-Zairian isolates were clearly different. 
Phage lambda clones carrying the complete viral 
genetic information were obtained and further char- 
acterized by restriction mapping and nucleotide 
sequence analysis ; clone E-H12 is derived from 
LAVeu infected cells and contains an integrated 
provirus with 5' flanking cellular sequences but a 



truncated 3' long terminal repeat (LTR) ; clone M-H 
11 was obtained by complete Hindlll restriction of 
DNA from LAVwAL-infected cells, taking avantage of 
the existence of a unique Hindlll site in the LTR. 

5 M-H 1 1 is thus probably derived from unintegrated 
viral DNA since that species was at least ten times 
more abundant than integrated provirus. 

Figure IB gives a comparaison of the restric- 
tion maps of LAVeli, LAVmal and prototype LAVbru. 

10 ail three being derived from their nucleotide se- 
quences, as well of three Zairian isolates previously 
mapped for seven restriction enzymes (Bonn et al., 
1985). Despite this limited number, all of the pro- 
files are clearly different (out of the 23 sites making 

75 up the map of LAV®''" only seven are present in all 
six maps presented), confirming the genetic poly- 
morphism of the AIDS virus. No obvious relation- 
ship is apparent between the five Zairian maps, 
and all of their common sites are also found in 

20 LAVbru. 

Conservation of the genetic organization. 



The genetic organization of LAVeu and LAVmal 

25 as deduced from the complete nucleotide se- 
quences of their cloned genomes is identical to 
that found in other isolates, i.e. 5'-gag-pol-central 
region-env-F;3'. Most noticeable is the conservation 
of the "central region" (fig. 2), located between the 

30 pol and env genes, which is composed of a series 
of overlapping open reading frames (orf) we had 
previously designated Q. R, S. T. and U after 
observing a similar organization in the ovine An- 
tivirus visna (Sonigo et al., 1985). The product of 

35 orf S (also designated "tat") is implicated in the 
transactivatlon of virus expression (Sodroski et al., 
1985 ; Arya et al., 1985) ; the biological role of the 
product of orf Q (also designated "sor" or orf A) is 
still unknown (Lee et al., 1986 ; kang et al.. 1986). 

40 Of the three other orfs (R, T, and U), only orf R is 
likely to be a seventh viral gene, for the following 
reasons : the exact conservation of its relative 
position with respect to Q and S (fig. 2), the con- 
stant presence of a possible splice acceptor and of 

45 a consensus AUG initiator codon, its similar codon 
usage with respect to viral genes, and finally the 
fact that the variation of its protein sequence within 
the different isolates is comparable to that of gag, 
pol and Q (see Fig. 4). 

50 Also conserved are the sizes of the U3. R and 
U5 elements of the LTR (data not shown), the 
location and sequence of their regulatory elements 
such as TATA box and AATAAA polyadenylation 
signal, and their flanking sequences i.e. primer 

55 binding site (PBS) complementary to 3' end of 
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and polypurine tract (PPT). Most of the genetic 
variability within the LTR is located in the 5' half of 
U3 (which encodes a part of orf F) while the 3' end 
of U3 and R, which carry most of the cis-acting 
regulatory elennents : pronDoter, enhancer and 
trans-activating factor receptor (Rosen et al., 1985), 
as well as the U5 element are well-conserved. 

Overall, it clearly appears that the Zairian Iso- 
lates belong to the same type of retrovirus as the 
previously sequenced isolates of American or Eu- 
ropean origin. 

Variability of the viral proteins. 

Despite their Identical genetic organization, 
these isolates show substantial differences in the 
primary structure of their proteins. The amino acid 
sequences of LAVeli and LAVmal proteins are pre- 
sented in figures 3A-3F (to be examined in con- 
junction with Figs. 7A-7J and 8A-8I), aligned with 
those of LAVbru and ARV 2. Their divergence was 
quantified as the percentage of amino-acids sub- 
stitutions in two-by-two alignments (Fig. 4). We 
have also scored the number of insertions and 
deletions that had to be Introduced in each of these 
alignments. 

Three general observations can be made. First, 
the protein sequences of the African isolates are 
more divergent from LAVbru than are those of 
HTLV-3 and ARV 2 (Fig. 4A) ; similar results are 
obtained if ARV 2 is taken as reference (not 
shown). The range of genetic polymorphism be- 
tween isolates of the AIDS virus is considerably 
greater than previously observed. Second, our two 
sequences confirm that the envelope is more vari- 
able than the gag and pol genes. Here again, the 
relatively small difference observed between the 
env of LAVbru and HTLV-3 appears as an excep- 
tion. Third, the mutual divergence of the two Af- 
rican isolates (Rg. 4B) Is comparable to that be- 
tween LAVbru and either of them; as far as we can 
extrapolate from only three sequenced isolates 
from the USA and Europe and two from Africa, this 
is indicative of a wider evolution of the AIDS virus 
in Africa. 

gag and pol : Their greater degree of conservation 
compared to the envelope is consistent with their 
-encoding important structural -or enzymatic activi- 
ties. Of the three mature gag proteins, the p25 
which was the first recognized immunogenic pro- 
tein of LAV (Barr^-Sinoussi et al.. 1983) Is also the 
better conserved (fig. 3). In gag and pol, differ- 
ences between isolates are principally due to point 
mutations, and only a small number of insertional 



or deletional events is observed. Among these, we 
must note the presence In the over-lapping part of 
gag and pol of LAVbru of an insertion of 12 
aminoacids (AA) which is encoded by the second 

5 copy of a 36 bp direct repeat present only in this 
isolate and in HTLV-3. This duplication was omitted 
because of a computing error in the published 
sequence of LAVbru (position 1712. Wain-Hobson 
et al.. 1985) but was indeed present in the HTLV-3 

70 sequences (Ratner et al., 1985 ; Muesing et al., 
1985). 

env : Three segments can be distinguished in the 
envelope glycoprotein precursor (Allan et al.. 1985 ; 
Montagnier et al.. 1985 ; DiMarzoVeronese et al., 

75 1985). The first is the signal peptide (positions 1-33 
in fig. 3), and its sequence appears as variable ; 
the second segment (pos. 34-530) forms the outer 
membrane protein (OMP or gp110) and carries 
most of the genetic variations, and in particular 

20 almost all of the numerous reciprocal insertions 
and deletions ; the third segment (531-877) is sep- 
arated from the OMP by a potential cleavage site 
following a constant basic stretch (Arg-Glu-Lys-Arg) 
and forms the transmembrane protein (TMP or gp 

25 41) responsible for the anchorage of the envelope 
glycoprotein in the cellular membrane. A better 
conservation of the TMP than the OMP has also 
been observed between the different murine leuke- 
mia viruses (MLV. Koch et al., 1983), and could be 

30 due to structural constraints. 

From the alignment of figure 3 and the graphi- 
cal representation of the envelope variability shown 
in figure 5, we clearly see the existence of con- 
served domains, with little or no genetic variation, 

$5 and hypervariable domains, in which even the 
alignment of the different sequences Is very dif- 
ficult, because of the existence of a large number 
of mutations and of reciprocal insertions and dele- 
tions. We have not included the sequence of the 

40 envelope of the HTLV-3 isolate since it so close to 
that of LAVbru (cf. fig. 4), even in the hypervariable 
domains, that it did not add anything to the analy- 
sis. While this graphical representation will be re- 
fined by more sequence data, the general profile is 

45 already apparent, with three hypervariable domains 
(Hyl, 2 and 3) all being located in the OMP. and 
separated by three well-conserved stretches 
(residues 37-130. 211-289, and 488-530 of fig. 3 
alignment) probably associated with important bio- 

50 logical functions. 

In spite of the extreme genetic variability, the 
folding pattern of the envelope glycoprotein Is 
probably constant. Indeed the position of virtually 
ail of the cysteine residues is conserved within the 

55 different isolates (fig. 3 and 5). and the only three 
variable cysteines fall either in the signal peptide or 
in the very C-terminal part of the TMP. The hyper- 
variable domains of the OMP are bounded by 
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conserved cysteines, suggesting that they may re- 
present loops attached to the common folding pat- 
tern. Also the calculated hydropathic profiles (Kyte 
and Doollttle, 1982) of the different envelope pro- 
teins are remarkably conserved (not shown). 

About half of the potential N-glycosylation 
sites. Asn-X-Ser/Thr. found In the envelopes of the 
Zairian isolates map to the same positions in 
LAVbru (17/26 for LAVeu and 17/28 for LAVmal). 
The other sites appear to fall within variable do- 
mains of env, suggesting the existence of differ- 
ences In the extent of envelope glycosylation be- 
tween different isolates. 

Other viral proteins : Of the three other identified 
viral proteins, the p27 encoded by orf F, 3' of env 
(Allan et al., 1985b) is the most variable (fig. 4). 
The proteins encoded by orfs Q and S of the 
central region are remarkable by their absence of 
insertions/deletions. Surprisingly, a high frequency 
of aminoacids substitutions, comparable to that ob- 
served in env, Is found for the product of orf S 
(trans-activating factor). On the other hand, the 
protein encoded by orf 0 is no more variable than 
gag. Also noticeable is the lower variation of the 
proteins encoded by the central regions of LAVeu 
and LAVmal. 

DISCUSSION 

With the availability of the complete nucleotide 
sequence from five independent isolates, some 
general features of the AIDS virus genetic vari- 
ability are now emerging. Firstly, its principal cause 
are point mutations very often resulting in amino- 
acid substitutions, and which are more frequence in 
the 3' part of the genome (orf S. env and orf F)- 
Like all RNA viruses, the retroviruses are thought to 
be highly subject to mutations caused by errors of 
the RNA polymerases during their replication, since 
there is no proofreading, of this step (Holland et al., 
1982 ; Stelnhauer and Holland. 1986). 

Another source of genetic diversity are 
insertions/deletions. From the figure 3 alignments, 
insertlonat events seem to be Implicated in most of 
the cases, since otherwise deletions should have 
occurred in independant isolates at the precisely 
the same location. Furthermore, upon analyzing 
these insertions, we have observed that they most 
often represent one of the two copies of a direct 
repeat (fig. 6). Some are perfectly conserved like 
the 36 bp repeat In the gag-pol overlap of LAVbru ' 
—(fig. 6-a) ; others carry point- mutations resulting In 
aminoacid substitutions, and as a consequence, 
they are more difficult to observe, though clearly 
present, In the hypervariable domains of env (cf. 
fig. 6-g and -h). As noted for point mutations, env 
gene and orf F also appear as more susceptible to 
that form of genetic ariation than the rest of the 



genome. The degree of conservation of these re- 
peats must be related to their date of occurrence in 
the analyzed sequences : the more degenerated, 
the more ancient. A very recent divergence of 
5 LAVbru and HTLV3 is suggested by with extremely 
low number of mismatched AA between their ho- 
mologous proteins. However, one of the LAVbru 
repeats (located in the Hyl domain of env, fig. 6-f) 
is not present In HTLV3, indicating that this genera- 

10 tion of tandem repeats is a rapid source of genetic 
diversity. We have found no traces of such a 
phenomenon, even when comparing very closely 
related viruses, such as the Mason-Rizer monkey 
virus, MPMV (Sonigo et al., 1986), and an im- 

76 munosuppressive simian virus, SRV-1 (Power et 
aL, 1986). Insertion or deletion of one copy of a 
direct repeat have been occasionally reported In 
mutant retroviruses (Shimotohno and Temin, 1981 ; 
Oarlix, 1986). but the extent at which we observe 

20 this phenomenon is unprecedented. 

The molecular basis of these duplications is 
unclear, but could be the "copy-choice" phenom- 
enon, resulting from the diploidy of the retroviral 
genome (Varmus and Swanstrom, 1984 ; Clark and 

25 Mak. 1983). During the synthesis of the first-strand 
of the viral DNA, jumps are known to occur from 
one RNA molecule to another, especially when a 
break or a stable secondary structure Is present on 
the template ; an inaccurate re-initiation on the 

30 other RNA template could result in the generation 
(or the elimination) of a short direct repeat. 

Genetic variability, and subsequent antigenic 
modifications, have often been developed by mi- 
croorganisms as a means to escape the host's 

35 immune response, either by modifying their epi- 
topes during the course of the infection, as in 
trypanosomes (Borst and Cross, 1982), or by gen- 
erating a large repertoire of antigens, as observed 
in influenza virus (Webster et al., 1982). As the 

40 human AIDS virus is related to animal lentiviruses 
(Sonigo et al., 1985 ; Chlu et al.. 1985), its genetic 
variability could be a source of antigenic variation, 
as can be observed during the course of the infec- 
tion by the ovine lentivlrus visna (Scott et al., 1979 

45 ; Clements et al., 1980) or by the equine infectious 
anemia virus (EIAV, Montelaro et al.. 1984). How- 
ever, a major discrepancy with these animal 
models is the extremely low. if any. neutralizing 
activity of the sera of individuals infected by the 

60 AIDS virus, whether they are healthy carriers, dis- 
playing minor symptoms or afflicted with AIDS 
(Weiss et al.. 1985 ; Clavel. et al., 1985). Further- 
more, even for the visna virus the exact role of 
antigenic variation in the pathogenesis is unclear 

55 (Thormar et al., 1983 ; Lutley et al., 1983). We 
rather feel that genetic variation represents a gen- 
eral selective advantage for lentiviruses by allowing 
an adaptation to different environments, for exam- 
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pie by modifying their tissue or host tropisms. In 
the particular case of the AIDS virus, rapid genetic 
variations are tolerated, especially in the envelope ; 
they could allow the virus to get adapted to dif- 
ferent "micro-environments" of the membrane of 
their principal target cells, namely the T4 lym- 
phocytes. These "micro-environments" could result 
from the immediate vicinity of the virus receptor to 
polymorphic surface proteins, differring either be- 
tween individuals or betwwen clones of lym- 
phocytes. 

Conserved domains in the AIDS virus envelope. 

Since the proteins of most of the isolates are 
antigenlcally cross-reactive, the genotypic differ- 
ences do not seem to affect the sensitivity of actual 
diagnostic tests, based upon the detection of anti- 
bodies to the AIDS virus and using purified virions 
as antigens. They nevertheless have to be consid- 
ered for the development of the "second-genera- 
tion" tests, that are expected to be more specific, 
and will use smaller synthetic or genetically-en- 
gineered viral antigens. The identification of con- 
served domains in the highly immunogenic en- 
velope glycoprotein, and also the core structural 
proteins (gag), is very important for these tests. 
The conserved stretch found at the end of the 
OMP and the beginning of the TMP (490-620, fig. 
3) could be a good candidate, since a bacterial 
fusion protein containing this domain was well- 
detected by AIDS patients sera (Chang et al., 
1985). 

The envelope, specifically the OMP, mediates 
the interaction between a retrovirus and its specific 
cellular receptor (DeLarco and Todaro, 1976 : Rob- 
inson et al., 1980). In the case of the AIDS virus, In 
vitro binding assays have shown the interaction of 
the envelope glycoprotein gp110 with the T4 cel- 
lular surface antigen (McDougal et al., 1986). al- 
ready thought to be, or closely associated to, the 
virus receptor (Klatzmann et al., 1984 ; Dagleish et 
al., 1984). Identification of the AIDS virus envelope 
domains that are responsible for this interaction 
(receptor-binding domains) appears as fundamental 
for understanding of the host-viral interactions, but 
also for designing a protective vaccine, since an 
immune response against these epitopes could 
possibly elicit neutralizing antibodies. As the AIDS 
virus receptor is at least partly formed of a con- 
stant structure, the T4 antigen, the binding site of 
■the envelope Is ■ unlikely- to be exclusively encoded 
by domains undergoing drastic genetic changes 
between isolates, even if these could be implicated 
in some kind of an "adaptation". One, or several of 
the conserved domains of the OMP (residues 37- 
130, 211-289. and 488-530 of fig. 3 alignment) 
brought together by the folding of the protein, must 



play a part in the virus-receptor Interaction, and this 
can be explored with synthetic or genetically-en- 
gineered peptides derived from these domains, ei- 
ther by direct binding assays, or indirectly by as- 
5 saying the neutralizing activity of specific anti- 
bodies raised against them. 

African AIDS viruses 

10 Zaire and the neighbouring countries of Central 
Africa are considered as an area of endemic for the 
AIDS virus infection, and the possibility that the 
virus has emerged in Africa has became a subject 
of intense controversy (see Norman, 1985). From 

75 the present study, it is clear that the genetic or- 
ganization of Zairian isolates is the same as that of 
american isolates, thereby indicating a common 
origin. The very Important sequence differences 
observed between the proteins are consistent with 

20 a divergent evolutionary process. In addition, the 
two African Isolates are mutually more divergent 
than the American isolates already analyzed ; as 
far as that observation can be extrapolated, it sug- 
gests a longer evolution of the virus in Africa, and 

25 is also consistent with the fact that a larger fraction 
of the population Is exposed than In developed 
countries. 

A novel human retrovirus with morphology and 
biological properties (cytopathogenicity, T4 tro- 

30 pism) similar to those of LAV, but nevertheless 
clearly genetically and antigenlcally distinct from 
that latter, was recently isolated from two patients 
with AIDS originating from Guinea Bissau, West- 
Africa (Clave! et al., 1986). In the neighbouring 

35 Senegal the population seems exposed to a retro- 
virus also distinct from LAV, but apparently non 
pathogenic (Barin et al., 1985 ; KankI et al., 1986). 
Both of these novel African retroviruses seem to be 
antigenlcally related to the simian T-cell lym- 

40 photropic virus. STLV-lll. shown to be widely 
present in healthy African green monkeys and oth- 
er simian species (Kanki et al. 1985). This raises 
the possibility of a large group of African primate 
lentiviruses, ranging from the apparently non-patho- 

45 genic simian viruses to the LAV-type viruses. Their 
precise relationship will only be known after their 
complete genetic characterization, but it is already 
very likely that they have evolved from a common 
progenitor. The important genetic variability we 

50 have observed between isolates of the AIDS virus 
in Central Africa is probably a hallmark of this 
entire group, and may account for the apparently 
important genetic divergence between its members 
(loss of cross-antlgenicity in the envelopes). In this 

55 sense the conservation of the tropism for the T4 
lymphocytes suggests that It is a major advantage 
acquired by these retroviruses. 
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EXPERIMENTAL PROCEDURES 

Virus isolations 

LAVeli and LAVmal were isolated from the pe- 
ripheral blood lymphocytes of the patients as de- 
scribed (Barrd-Sinoussi et al., 1983) ; briefly, the 
lymphocytes were fractionated and co-cultivated 
with phytohaemagglutinin-stlmulated normal human 
lymphocytes in the presence of interleukin 2 and 
anti-alpha interferon serum. Viral production was 
assessed by cell-free reverse transcriptase (RT) 
activity assay in the cultures and by electron 
microscopy. 

IMolecular cloning 

Normal donor lymphocytes were acutely In- 
fected (10* cpm of RT activity/10^ cells) as de- 
scribed (Barr^-Sinoussl et al., 1983). and total DNA 
was extracted at the beginning of the RT activity 
peak. For LAVeli. a lambda library using the L47-1 
vector (Loenen and Brammar, 1982) was construct- 
ed by partial Hindlll digestion of the DNA as al- 
ready described (Alizon et al., 1984). For LAVmal. 
DNA from infected cells was digested to comple- 
tion with Hindlll and the 9-1 Okb fraction was se- 
lected on 0.8 % low melting point agarose gel and 
ligated into L47-1 Hindlll arms. About 5.1 0^ 
plaques for LAVgu and 2,10^ for LAVmal. obtained 
by in vitro packaging (Amersham) were plated on 
E. coli LAI 01 and screened In situ under stringent 
conditions.using the 9 kb Sad insert of the clone 
lambda J19 (Alizon et al., 1984) carrying most of 
the LAVbru genome as probe. Clones displaying 
positive signals were plaque-purified and propa- 
gated on E. coli C600 recBC, and two recombinant 
phages carrying the complete genetic information 
of LAVeli (E-H12) and LAVmal (M-H11) were further 
characterized by restriction mapping. 

Nucleotide sequence strategy 

Viral fragments derived from E-H12 and M-H11 

were sequenced by the dideoxy chain terminator 
procedure (Sanger et al., 1977) after "shotgun" 
cloning In the M13mp8 vector (Messing and Viera, 
1982), as previously described (Sonigo et al., 
1985). The viral genome of LAVeli is 9176 
nucleotides, that of LAVmal 9229 nucleotides long. 
Each nucleotide was determined from more than 5 
independent " Clones on average. - Complete 
nucleotide sequences are not presented in this 
article for obvious reasons of space limitation but 
are freely available upon request to the authors, 
until they are released through sequence data 
banks. 



LEGEND OF THE FIGURES 

Figure 1 : Restriction map analysis of AIDS virus 
isolates. 

5 

N Restriction maps of the inserts of phage 
lambda clones derived from cells infected with 
LAVeli (E-H12) and LAVmal (M-H11). The sche- 
matic genetic, organization of the AIDS virus has 

10 been drawn above the maps. The LTRs are 
indicated by solid boxes. A:Aval-B:Bam Hi- 
Bg:Bglll-E:EcoRi • H:Hindlll - Hc:Hlncll - K:Kpni- 
N:Nde l-P:Pstl-S:Sacl-X:Xbal. Asterisks indicate 
the Hindlll cloning sites In lambda L47-1 vector. 

T5 B/Comparison of the sites for seven restriction 
enzymes in six isolates : the prototype AIDS 
virus LAVbru. LAVmal and LAVeu ; Z1, Z2, Z3 
are Zairian isolates with published restriction 
maps (Benn et al., 1985). Restriction sites are 

20 represented by the following symlx)ls : Bglll ; 
EcoRI ; Hindi ; Hindlll ; Kpnl ; Ndel ; Sad. 

Figure 2 : Conservation of the genetic organization 
of the central region in AIDS virus Isolates. 

25 

Stop codons in each phse are represented as 
vertical bars. Vertical arrows indicate possible AUG 
Initiation codons. Splice acceptor (A) and donor 
(D) sites identified in subgenomic viral mRNA 

30 (Muesing et al., 1986) are shown below the graphic 
of LAVbru. and corresponding sites in LAVeu and 
LAWmal are indicated. PPT indicates the repeat of 
' the polypurine tract flanking the 3'LTR. As ob- 
served in LAVbru (Wain-Hobson et al., 1985), the 

35 PPT is repeated 256 nucleotides 5' to the end of 
the pol gene In both our sequences, but this repeat 
is degenerated at two positions In LAVeu. 

Figure 3 : Alignment of the protein sequences of 
40 four AIDS virus Isolates. 

Isolate LAVbru (Waln-Hobson et al., 1985) is 
taken as reference ; only differences with LAVbru 
are noted for ARV2 (Sanchez-Pescador et al., 

45 1985) and the two Zairian isolates LAVmal and 
LAVeu. A minimal number of gaps (-) was intro- 
duced in the alignments. The NH2-termlnl of p25»"8 
and pl8^®o are Indicated (Sanchez-Pescador, 
1985). The potential cleavage sites in the envelope 

50 precursor (Allan et al., 1985a ; dlMarzo- Veronese, 
1985) separating the signal peptide (SP). the outer 
membrane protein (OMP) and the transmembrane 
protein (TMP) are indicated as vertical arrows ; 
conserved cysteines are indicated by black circles 

55 and variable cysteines are boxed. The one letter 
code for amino acids is : A:Ala ; C:Cys ; D:Asp ; 
E:Glu : F:Phe ; G:Gly ; H :His ; l:lie ; KrLys ; LLeu ; 
M:Met ; N:Asn ; P:Pro ; Q:Gln ; R:Arg ; S:Ser ; 
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T:Thr : V:Val ; W:Trp ; Y:Tyr. 

Figure 4 : Quantitation of the sequence divergence 
between homologous proteins of different isolates. 

Part A of each table gives results deduced 
from two-by-two alignments using the proteins of 
LAVbru as reference, part B those of LAVeh as 
reference. Sources: Muesing et al., 1985 for HTLV- 
3 ; Sanchez-Pescador et al., 1985 for ARV 2 and 
Wain-Hobson et al.. 1985 for LAVbru- For each 
case of the tables, the size in amino-acids of the 
protein (calculated from the first methionine resi- 
due, or from the beginning of the orf for pol) Is 
given at the upper left part. Below are given the 
number of deletions (left) and insertions (right) nec- 
essary for the alignment. The large numbers in 
bold face represent the percentage of amino-acids 
substitutions (insertions/deletions being excluded). 
Two by two alignments were done with computer 
assistance Wiiburg and Lipman, 1983). using a gag 
penalty of 1, K-tuple of 1. and window of 20, except 
for the hypervariable domains of env, where the 
number of gaps was made minimum, and which 
are essentially aligned as shown in fig. 3. The 
sequence of the predicted protein encoded by orf 
R of HTLV-3 has not been compared because of a 
premature termination relative to all other Isolates. 

Figure 5 : Variability of the AIDS virus envelope 
protein. 

For each position x of the alignment of env - 
(Fig. 3), variability V(x) was calculated as 

number of different amino-acids at position x V(x) 
= frequency of the most abundant amino-acid at 
position X. 

Gaps in the alignments are considered as an- 
other amino-acid. For an alignment of 4 proteins, 
V(x) ranges from 1 (identical AA In the 4 se- 
quences) to 16 (4 different AA). This type of repre- 
sentation has previously been used In a compila- 
tion of the AA sequence of Immunoglobulins vari- 
able regions (Wu and Kabat, 1970). Vertical arrows 
indicate the cleavage sites ; asterisks represent 
potential N-glysosylation sites (N-X-S/T) conserved 
in all four isolates ; black triangles represent con- 
served cysteine residues. Black lozanges mark the 
three major hydrophobic domains. OMR : outer- 
- membrane-protein ; TMP : transmembrane protein ; 
signal': signal peptide ; Hyl.'*2, 3, : hypen;rariable 
domains. 

Figure 6 : Direct repeats in the proteins of different 
AIDS virus isolates. 



These examples are derived from the aligned 
sequences of gag (a, b), F (c,d) an env (e, f, g, h) 
shown in figure 3. The two elements of the direct 
repeat are boxed, while degenerated positions are 

5 underlined. 

The invention thus pertains more specifically to 
the proteins, polypeptides or glycoproteins Includ- 
ing the polypeptldic strucutres shown in the draw- 
ings. The first and last amino-acid residues of 

10 these proteins, polypeptides or glycoproteins carry 
numbers computed from a first aminoacid of the 
open-reading frames concerned, although these 
numbers do not correspond exactly to those of the 
LAVeli or LAVmal proteins concerned, rather to 

75 those of the LAVbru corresponding proteins or se- 
quences shown In figs. 3A, 3B and 3C. Thus a 
number corresponding to a "first amino-acid resi- 
due" of a LAVeli protein corresponds to the num- 
ber of the first amino-acyl residue of the cor- 

20 responding 1-AVbru protein which, in any of figs. 
3A, 3B or 3C is in direct alignment with the cor- 
responding first amino-acid of the LAVeli protein. 
Thus the sequences concerned can be read from 
figs. 7A-7J and 8A-8I, to the extent where they do 

25 not appear with sufficient clarity from Figs. 3A-3F. 

The preferred protein sequences of this inven- 
tion extend from the corresponding "first" and 
"last" amino-acid residues (reference Is also made 
to the protein(s)- or glycoprotein (s)-portions includ- 

30 ing part of the sequences which follow : 

OMP or gpllO proteins, including precursors : 
1 to 530 

OMP or gpl 10 without precursor : 
34-530 

35 Sequence carrying the TMP or gp41 protein : 
531-877, particularly 
680-700 

well conserved stretches of OMP : 
37-130. 
40 211-289 and 
488-530 

well conserved stretch found at the end of the 

OMP and the beginning of TMP : 

490-620. 

45 Proteins containing or consisting of the "well 

conserved stretches"are of particular interest for 
the production of immunogenic compositions and 
(preferably in relation to the stretches of tiie env 
protein) of vaccine compositions against the LAV- 

50 viruses of class 1 as above-defined. 

The invention concerns more particularly all the 
DNA fragments which have been more specifically 
referred to in the drawings and which correspond 
to open reading frames. It will be understood that 

55 the man skilled In the art will be able to obtain 
them all, for Instance by cleaving an entire DNA 
corresponding to the complete genome of either 
LAVeu or of LAVmal. such as by cleavage by a 

10 
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partial or complet digestion thereof with a suitable 
restriction enzyme and by the subsequent recovery 
of the relevant fragments. The different DNAs dis- 
closed above can be resorted to also as a source 
of suitable fragments. The techniques disclosed in 
PCT application for the isolation of the fragments 
which can then be included in suitable plasmids 
are applicable here too. 

Of course other methods can be used. Some 
of them have been examplified in European Ap- 
plication Nr. 178,978 filed on September 17. 1985. 
Reference is for instance made to the following 
methods. 

a) DNA can be transfected into mammalian cells 
with appropriate selection markers by a variety 
of techniques, calcium phosphate precipitation, 
polyethylene glycol, protoplast-fusion, etc.. 

b) DNA fragments corresponding to genes can 
be cloned Into expression vectors for E. coli, 
yeast- or mammalian cells and the resultant 
proteins purified. 

c) The provival DNA can be "shot-gunned" 
(fragmented) into procaryotic expression vectors 
to generate fusion polypeptides. Recombinant 
producing antigenically competent fusion pro- 
teins can be identified by simply screening the 
recombinants with antibodies against LAV anti- 
gens. - 

The Invention further refers more specifically to 
DNA recombinants, particularly modified vectors 
Including any of the preceding DNA sequences and 
adapted to transform corresponding microorgan- 
isms or cells, particularly eucaryotic cells such as 
yeasts, for instance saccharomyces cerevisiae, or 
higher eucaryotic ceils, particularly cells of mam- 
mals, and to permit expression of said DNA se- 
quences in the corresponding microorganisms or 
cells. General methods of that type have been 
recalled in the abovesaid PCT international patent 
application PCT/EP 85/00548 filed on October 18, 
1985. 

More particularly the Invention relates to such 
modified DNA recombinant vectors modified by the 
abovesaid DNA sequences and which are capable 
of transforming higher eucaryotic cells particularly 
mammalian cells. Preferably any of the abovesaid 
sequences are placed under the direct control of a 
promoter contained in said vectors and which Is 
recognized by the polymerases of said cells, such 
that the first nucleotide codons expressed corre- 
spond to the first triplets of the above-defined 
- -DNA-sequences. Accordingly this Invention also re- 
lates to the corresponding DNA fragments which 
can be obtained from genomas of LAVeli or LAV 
MAL or corresponding cDNAs by any appropriate 
method. For instance such a method comprises 
cleaving said LAV genomas or cDNAs by restric- 
tion enzymes preferably at the level of restriction 



sites surrounding said fragments and close to the 
opposite extremities respectively thereof, recover- 
ing and identifying the fragments sought according 
to sizes, if need be checking their restriction maps 
5 or nucleotide sequences (or by reaction with mon- 
oclonal antibodies specifically directed against epi- 
topes carried by the polypeptides encoded by said 
DNA fragments), and further if need be, trimming 
the extremities of the fragments, for instance by an 

10 exonucleolytic enzyme such as Bal31 , for the pur- 
pose of controlling the desired nucleotide-se- 
quences of the extremities of said DNA fragments 
or. conversely, repairing said extremities with 
Klenow enzyme and possibly Ilgating the latter to 

75 synthetic polynucleotide fragments designed to 
permit the reconstitution of the nucleotide extrejm- 
ities of said fragments. Those fragments may then 
be inserted In any of said vectors for causing the 
expression of the corresponding polypeptide by the 

20 cell transformed therewith. The corresponding poly- 
peptide can then be recovered from the trans- 
formed cells, if need be after lysis thereof, and 
purified, by methods such as electrophoresis. 
Needless to say that all conventional methods for 

25 performing these operations can be resorted to. 

The invention also relates more specifically to 
cloned probes which can be made starting from 
any DNA fragment according to this Invention, thus 
to recombinant DNAs containing such fragments, 

30 partlculariy any plasmids amplifiable in procaryotic 
or eucaryotic cells and carrying said fragments. 

Using the cloned DNA fragments as a molecu- 
lar hybridization probe - either by labelling with 
radio-nucleotides or with fluorescent reagents - 

55 LAV virion RNA may be detected directly in the 
blood, body fluids and blood products (e.g. of the 
antlhemophylic factors such as Factor VIII con- 
centrates) and vaccines, i.e. hepatitis B vaccine It 
— has already been shown that whole virus can be 

40 detected in culture supernatants of LAV producing 
cells. A suitable method for achieving that detec- 
tion comprises immobilizing virus onto a support, 
e.g. nitrocellulose filters, etc.. disrupting the virion 
and hybridizing with labelled (radiolabelled or 

45 "cold" fluorescent- or enzyme-labelled) probes. 
Such an approach has already been developed for 
Hepatitis B virus In . peripheral blood (according to 
SCOTTO J. et al. Hepatology (1983), 3. 379-384). 
Probes according to the invention can also be 

60 used for rapid screening of genomic DNA derived 
from the tissue of patients with LAV related symp- 
toms, to see if the proviral DNA or RNA present In 
host tissue and other tissues can be'related to that 
of LAVeli or LAVmal- 

55 A method which can be used for such screen- 
ing comprises the following steps : extraction of 
DNA from tissue, restriction enzyme cleavage of 
said DNA, electrophoresis of the fragments and 
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Southern blotting of genomic DNA fronn tissues, 
subsequent hybridization with labelled cloned LAV 
proviral DNA. Hybridization in situ can also be 
used. 

Lynnphatic fluids and tissues and other non- 5 
lymphatic tissues of humans, primates and other 
mammalian species can also be screened to see if 
other evolutionnary related retrovirus exist. The 
methods refen^ed to hereabove can be used, al- 
though hybridization and washings would be done io 
under non stringent conditions. 

The DNAs or DNA fragments according to the 
Invention can be used also for achieving the ex- 
pression of viral antigens of LAVeli or LAVmal for 
diagnostic purposes. 75 

The invention relates generally to the polypep- 
tides themselves, whether synthetlzed chemically 
isolated from viral preparation or expressed by the 
different DNAs of the inventions, particularly by the 
ORFs or fragments thereof, in appropriate hosts, 20 
particularly procaryotic or eucaryotic hosts, after 
transformation thereof with a suitable vector pre- 
viously modified by the corresponding DNAs. 

More generally, the invention also relates to 
any of the polypeptide fragments (or molecules. 25 
particularly glycoproteins having the same poly- 
peptidlc backbone as the polypeptides mentioned 
hereabove) bearing an epitope characteristic of a 
protein or glycoprotein of LAVen or LAVmal. which 
polypeptide or molecule then has N-terminal and 30 
C-termlnal extremities respectively either free or, 
independently from each other, covalently bond to 
aminoacids other than those which are normally 
associated with them in the larger polypeptides or 
glycoproteins of the LAV virus, which last men- 35 
tioned aminoacids are then free or belong to an- 
other polypeptidic sequence. Particularly the inven- 
tion relates to hybrid polypeptides containing any 
of the epitope-bearlng-polypeptldes which have 
been defined more specifically hereabove. recom- 40 
bined with other polypeptides fragments normally 
foreign to the LAV proteins, having sizes sufficient 
to provide for an increased immunogenicity of the 
epitope-bearlng-polypeptlde yet, said foreign poly- 
peptide fragments either being immunogenically 45 
inert or not interfering with the immunogenic prop- 
erties of the epitope-bearing-polypeptlde. 

Such hybrid polypeptides which may contain 
from 5 up to 150, even 250 aminoacids usually 
consist of the expression products of a vector 50 
which contained ab initio a nucleic acid sequence 
expressible..under the- control of a suitable pro- 
moter or replicon in a suitable host, which nucleic 
acid sequence had however beforehand been 
modified by Insertion therein of a DNA sequence 55 
encoding said epitope-bearing-polypeptlde. 

Said epitope-bearing-polypeptldes, particularly 
those whose N-terminal and C-terminal aminoacids 



are free, are also accessible by chemical synthesis, 
according to technics well known in the chemistry 
of proteins. 

The synthesis of peptides In homogeneous so- 
lution and in solid phase Is well known. 

In this respect, recourse may be had to the 
method of synthesis in homogenous solution de- 
scribed by Houbenweyl in the work entitled 
"Methoden der Organischen Chemie" (Methods of 
Organic Chennlstry) edited by E. WUNSCH.. vol. 
15-1 and II, THIEME. Stuttgart 1974. 

This method of synthesis consists of succes- 
sively condensing either the successive aminoacids 
in twos, in the appropriate order or successive 
peptide fragments previously available or formed 
and containing already several aminoacyl residues 
in the appropriate order respectively. Except for the 
carboxyl and amino-groups which will be engaged 
in the formation of the peptide bonds, care must be 
taken to protect beforehand all other reactive 
groups borne by these aminoacyl groups or frag- 
ments.However, prior to the formation of the pep- 
tide bonds, the carboxyl groups are advantageous- 
ly activated, according to methods well known in 
the synthesis of peptides. Alternatively, recourse 
may be had to coupling reactions bringing into play 
conventional coupling reagents, for instance of the 
carbodiimlde type, such as 1-ethyl-3-(3-dimethyl- 
aminopropy!)-carbodiimide. When the aminoacid 
group used carries an additional amine group (e.g. 
lysine) or another acid function (e.g. glutamic acid), 
these groups may be protected by carbobenzoxy 
or t-butyloxycarbonyl groups, as regards the amine 
groups, or by t-butylester groups, as regards the 
carboxylic groups. Similar procedures are available 
for the protection of other reactive groups, for ex- 
ample, SH group (e.g. In cysteine) can be pro- 
tected by an acetamidomethyl or paramethoxy ben- 
zyl group. 

In the case of progressive synthesis, aminoacid 
by aminoacid, the synthesis starts preferably by 
the condensation of the C-termlnal aminoacid with 
the aminoacid which corresponds to the neighbor- 
ing aminoacyl group in the desired sequence and 
so on, step by step, up to the N-terminal 
aminoacid. Another preferred technique can be re- 
lied upon is that described by R.D. Merrlfield in 
"solid phase peptide synthesis" (J. Am. Chem. 
Soc, 45, 2149-2154). 

In accordance with the Merrifield process, the 
first C-terminal aminoacid of the chain is fixed to a 
suitable porous polymeric resin, by means of its 
carboxylic group, the amino group of said 
aminoacid then being protected, for example by a 
t-butyloxycarbonyl group. 

When the first C-terminal aminoacid is thus 
fixed to the resin, the protective group of the amine 
group is removed by washing the resin with an 
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acid, i.e. trifluoroacetic acid, when the protective 
group of the amine group Is a t-butyloxycarbonyl 
group. 

Then the carboxylic group of the second 
aminoacid which is to provide the second 5 
aminoacyl group of the desired peptidic sequence, 
is coupled to the deprotected amine group of the 
C-termlnal aminoacid fixed to the resin. Preferably, 
the carboxyl group of this second aminoacid has 
been activated, for example by dicyclohexyl-car- io 
bodiimide, while its amine group has been pro- 
tected, for example by a t-butyloxycarbonyl group. 
The first part of the desired peptide chain, which 
comprising the first two aminoacids, is thus ob- 
tained. As previously, the amine group Is then de- 15 
protected, and one can further proceed with the 
fixing of the next aminoacyl group and so forth until 
the whole peptide sought is obtained. 

The protective groups of the different side 
groups, If any, of the peptide chain so formed can .20 
then be removed. The peptide sought can then be 
detached from the resin, for example, by means of 
hydrofluoric acid, and finally recovered in pure 
■ fomn from the acid solution according to conven- 
tional procedures. 25 

As regards the peptide sequences of smallest 
size and bearing an epitope or immunogenic deter- 
mlnant, and more particularly those which are read- 
ily accessible by chemical synthesis, it may be 
required, in order to increase their in vivo im- 30 
munogenic character, to couple or "conjugate" 
them covalently to a physiologically acceptable and 
non toxic carrier molecule. 

By way of examples of carrier molecules or 
macromolecuiar supports which can be used for 35 
making the conjugates according to the invention, 
will be mentioned natural proteins, such as tetanic 
toxoid, ovalbumin, serum-albumins, hemocyanins. 
etc.. Synthetic macromolecuiar carriers, for exam- 
ple polyslnes or poly(D-L-alanine)-poly(L-lysine)s, 40 
can be used too. 

Other types of macromolecuiar carriers which 
can be used, which generally have molecular 
weights higher than 20,000, are known from the 
literature. 4S 

The conjugates can be synthesized by known 
processes, such as described by Frantz and Rob- 
ertson in "Infect, and Immunity", 33, 193-198 
(1981), or by P.E. Kauffman In "Applied and Envi- 
ronmental Microbiology", October 1981 Vol. 42, n° 50 
4, 611-614. 

For instance. the. following coupling agents can 

be used : glutaric aldehyde,- ethyl chloroformate, 
water-soluble carbodiimides (N-ethyl-N'(3- 
dlmethylaminopropyl) carbodilmlde, HC1), 55 
dlisocyanates, bis-diazobenzidine, di- and trichloro- 
s-triazines, cyanogen bromides, benzaquinone, as 
well as coupling agents mentioned in "Scand. J. 



Immunol.. 1978, vol. 8, p. 7-23 (Avrameas, Ter- 
nynck, Guesdon). 

Any coupling process can be used for bonding 
one or several reactive groups of the peptide, on 
the one hand, and one or several reactive groups 
of the carrier, on the other hand. Again coupling is 
advantageously achieved between carboxyl and 
amine groups carried by the peptide and the car- 
rier or vice-versa in the presence of a coupling 
agent of the type used in protein synthesis, i.e. 1- 
ethyl-3-(3-dimethylaminopropyl)-carbodiimide, N- 
hydroxybenzotriazole, etc.. Coupling between 
amine groups respectively borne by the peptide 
and the carrier can also be made with glutaral- 
dehyde, for instance, according to the method de- 
scribed by BOQUET, P. et al. (1982) Molec. Im- 
munol.. 19. 1441-1549, when the carrier Is 
hemocyanin. 

The immunogenicity of epltope-bearing-pep- 
tides can also be reinforced, by oligomerisatlon 
thereof, for example in the presence of glutaral- 
dehyde or any other suitable coupling agent. In 
particular, the invention relates to the water soluble 
Immunogenic oligomers thus obtained, comprising 
particularly from 2 to 10 monomer units. 

The glycoproteins, proteins and polypeptides 
(generally designated hereafter as "antigens" of 
this Invention, whether obtained (by methods such 
as disclosed in the earlier patent applications re- 
ferred to above) in a purified state from LAVeli or 
LAVmal virus preparations or - as concerns more 
particularly the peptides - by chemical synthesis, 
are useful in processes for the detection of the 
presence of anti-LAV antibodies in biological me- 
dia, particularly biological fluids such as sera from 
man or animal, particularly with a view of possibly 
diagnosing LAS or AIDS. 

Particularly the Invention relates to an in vitro 
process of diagnosis making use of an envelope 
glycoprotein (or of a polypeptide bearing an epi- 
tope of this glycoprotein of LAVeu or LAVmal for 
the detection of anti-LAV antibodies in the serums 
of persons who carry them. Other polypeptides - 
particular those carrying an epitope of a core pro- 
tein - can be used too. 

A preferred embodiment of the process of the 
Invention comprises : 

- depositing a predetermined amount of one or 
several of said antigens In the cups of a 
titration microplate ; 

- Introducing of increasing dilutions of the bio- 
logical fluid, i.e.. serum to be diagnosed Into 
these cups ; - - • — • 

- incubating the microplate ; 

- washing carefully the microplate with an ap- 
propriate buffer ; 

- adding into the cups specific labelled anti- 
bodies directed against blood im- 
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munoglobulins and 
- detecting the antigen-antibody-complex 
formed, which is then indicative of the pres- 
ence of LAV antibodies in the biological fluid. 

Advantageously the labelling of the antiim- 
munoglobulin antibodies is achieved by an enzyme 
selected from among those which are capable of 
hydrolysing a substrate, which substrate undergoes 
a modification of its radiation-absorption, at least 
within a predetermined band of wavelengths. The 
detection of the substrate, preferably comparatively 
with respect to a control, then provides a measure- 
ment of the potential risi<s or of the effective pres- 
ence of the disease. 

Thus preferred methods immuno-enzymatic or 
also immunofluorescent detections, in particular ac- 
cording to the ELISA technique. Titrations may be 
determinations by immunofluorescence or direct or 
indirect immuno-enzymatic determinations. Quan- 
titative titrations of antibodies on the serums stud- 
ied can be made. 

The invention also relates to the diagnostic kits 
themselves for the in vitro detection of antibodies 
against the LAV virus, which kits comprise any of 
the polypeptides identified herein, and all the bio- 
logical and chemical reagents, as well as equip- 
ment, necessary for peforming diagnostic assays. 
Preferred kits comprise all reagents required for 
carrying out ELISA assays. Thus preferred kits will 
include, In addition to any of said polypeptides, 
suitable buffers and anti-human immunoglobulins, 
which anti-human Immunoglobulins are labelled ei- 
ther by an Immunofluorescent molecule or by an 
enzyme. In the last instance preferred kits then 
also comprise a substrate hydrolysable by the en- 
zyme and providing a signal, particularly modified 
absorption of a radiation, at least in a determined 
wavelength, which signal is then indicative of the 
presence of antibody in the biological fluid to be 
assayed with said kit. 

It can of course be of advantage to use several 
proteins or polypeptides not only of both LAVeli 
and LAVmal. but also of any or both of them 
together with homologous proteins or polypeptides 
of earlier described viruses, e.g. of LAVbru or 
HTLVji, or ARV. etc.. 

The invention also relates to vaccine composi- 
tions whose active principle is to be constituted by 
any of the antigen, i.e. the hereabove disclosed 
polypeptides whole antigens, of either LAVeli or 
LAVmal. or both, particularly the purified gpllO or 
immunogenic fragments thereof, fusion polypep- 
tides or oligopeptides in association with a suitable 
pharmaceutical or physiologically acceptable car- 
rier. 

A first type of preferred active principle Is the 
gp110 immunogen of said immunogens. 

Other preferred active principles to be consid- 



ered in that fields consist of the peptides contain- 
ing less than 250 aminoacid units, preferably less 
than 150. particulariy from 5 to 150 aminocid resi- 
dues, as deducible for the complete genomas of 
6 LAVeli and LAVmal and even more preferably those 
peptides which contain one or more groups se- 
lected from Asn-X-Ser and Asn-X-Ser as defined 
above. Preferred peptides for use in the production 
of vaccinating principles are peptides (a) to (f) as 

10 defined above. By way of example having no limi- 
tative character, there may be mentioned that suit- 
able dosages of the vaccine compositions are 
those which are effective to elicit antibodies in vivo, 
in the host, particulariy a human host. Suitable 

75 doses range from 10 to 500 micrograms of poly- 
peptide, protein or glycoprotein per kg. for instance 
50 to 100 micrograms per kg. 

The different peptides according to this Inven- 
tion can also be used themselves for the produc- 

20 tlon of antibodies, preferably monoclonal antibodies 
specific of the different peptides respectively. For 
the production of hybridomas secreting said mon- 
oclonal antibodies, conventional production and 
screening methods are used. These monoclonal 

25 antibodies, which themselves are part of the inven- 
tion then provide very useful tools for the identifica- 
tion and even determination of relative proportions 
of the different polypeptides or proteins In biologi- 
cal samples, particularly human samples containing 

30 LAV or related viruses. 

The invention further relates to the hosts 
(procaryotic or eucaryotic cells) which are trans- 
formed by the above mentioned recombinants and 
which are capable of expressing said DNA frag- 

35 ments. 

Rnally the invention also concerns vectors for 
the transformation fo eucaryotic cells of human 
origin, particularly lymphocytes, the polymerase of 
which are capable of recognizing the LTRs of LAV. 

40 Particularly said vectors are characterized by the 
presence of a LAV LTR therein, said LTR being 
then active as a promoter enabling the efficient 
transcription and translation in a suitable host of a 
DNA insert coding for a determined protein placed 

45 under its controls. 

It must be understood that the claims which 
follow are also intended to cover all equivalents of 
the products (glycoproteins, polypeptides, DNAs, 
etc..) whereby an equivalent Is a product, i.e. a 

50 polypeptide which may distinguish from a deter- 
mined one defined in any of said claims, say 
^ .through one or several amino-acids. while still hav- 
ing substantially the same immunological or im- 
munogenic properties. A similar rule of equivalency 

55 shall apply to the DNAs, it being understood that 
the rule of equivalency will then be tied to the rule 
of equivalency pertaining to the polypeptides which 
they encode. 

14 



25 



EP 0 253 701 B1 



26 



It will also be understood that all the literature 
referred to hereinbefore or hereinafter, and all pat- 
ent applications or patents not specifically iden- 
tified herein but which form counterparts of those 
specifically designated herein must be considered 5 
as incorporated herein by reference. 

It should further be mentioned that the inven- 
tion further relates to immunogenic, compositions 
containing preferably not only any of the polypep- 
tides more specifically identified above and which io 
have the aminoacid-sequences of LAVeli and 
LAVmal which have been identified, but corre- 
sponding peptidic sequences to previously defined 
LAV proteins too. 

In that respect the invention relates more par- is 
ticularly to the particular polypeptides which have 
the sequences corresponding more specifically to 
the LAVbru sequences which have been referred to 
earlier, i.e. the sequences extending between the 
following first and last aminoacids. of the LAVbru 20 
proteins themselves, i.e. the polypeptides having 
sequences contained in the LAVbru OMP or 
LAVbru TMP or sequences extending over both, 
particularly those extending from between the fol- 
lowing positions of the aminoacids Included In the 25 
env open reading frame of the LAVbru genome, 
1^30 
34-530 

and more preferably 

531 -877, particularly 30 

680-700 

37-130 

211-289 

488-530 

490-620. 35 

These different sequences can be used for any 
of the above defined purposes and in any of the 
compositions which have been disclosed. 

Finally the invention also relates to the different 
antibodies which can be formed specifically against 4o 
the different peptides which have been disclosed 
herein, particularly to the monoclonal antibodies 
which recognize them specifically. The correspond- 
ing hybridomas which can be formed starting from 
spleen cells previously Immunized with such pep- 4S 
tides which are fused with appropriate myeloma 
cells and selected according to standard proce- 
dures also form part of the invention. 

Phage X clone E-H12 derived from LAVeu Inf- 
ected cells has been deposited at the "Collection 50 
Nationale des Cultures de Micro-organismes" 
(National Collection of Cultures of- Microorganisms) 
(CNCM) of the Pasteur Institute of Paris France, 
under vP 1-550 on May 9th, 1986. 

Phage X clone M-H11 derived from LAVmal ss 
infected cells has been deposited at the CNCM 
under tf 1-551 on May 9th, 1986. 
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J.Exp.Med. 132. 211-250. 
Claims 

1. The virus LAVeu whose RNA corresponds to 5 
the cDNA of figs. 7A-7J. 

2. The virus LAVmal whose RNA corresponds to 
the cDNA of figs. 8A-8I. 

10 

3- The DNA, such as cDNA, of figs. 7A-7J. 

4. The DNA, such as cDNA, of figs. 8A-8I. 

5. A DNA recombinant containing the DNA of 75 
ciaim 3 or 4. 

6. A probe containing a cloned nucleic acid of 
anyone of claims 3 to 5. 

20 

7. A method for identifying the presence or ab- 
sence in a host tissue of I-AVeu or LAVmal or a 
provirus thereof which comprises hybridizing 
DNA obtained from said tissue with a probe of 
claim 6 and detecting the presence or absence 25 
of said virus or provirus in said tissue depend- 
ing upon whether or not there is hybridization 

with said probe. 

8. A protein or glycoprotein which is encoded by 30 
an open reading frame of the DNA of claim 3 

or 4 or a part of the protein or glycoprotein 
which corresponds to the stretch extending 
from aminoacyl residue 37 to amlnoacy! resi- 
due 130, or from aminoacyl residue 211 to 3S 
aminoacyl residue 289, or from aminoacyl resi- 
due 488 to aminoacyl residue 530 of fig. 3. 

9. A part of the protein or glycoprotein of claim 8 
which corresponds to the stretch extending 4o 
from the aminoacyl residue 490 to the 
aminoacyl residue 620 of fig. 3. 

10. A part of the protein or glycoprotein of claim 8 
whose amino acid sisquence consists essen- 45 
tially of all or part of the sequences which 
follow: 

OMP or gpllO proteins, including precursors: 
1 to 530 

OMP or gpl 10 without precursor: so 
34-530 

Sequence carrying the -TMR. or gp41 protein: 

- 531-877, particularly ~ 
680-700 

well conserved stretches of OMP: 55 

37-130. 

211-289 and 

488-530 



or well conserved stretch found at the end of 
the OMP and the beginning of TMP: 
490-620. 

11. A method for the in vitro detection of the 
presence of antibodies directed against LAVeu 
or LAVmal in a human body fluid which com- 
prises: contacting said body fluid with antigens 
obtained from the viruses of claim 1 or 2 or 
consisting of a protein, glycoprotein or part 
thereof of any one of claims 8 to 10; and 
detecting the Immunological reaction between 
said antigens and said antibodies. 

12. The method of claim 11 which comprises the 
steps of: 

- depositing a predetermined amount of 
one or several of said antigens in the 
cups of a titration microplate; 

- introducing Increasing dilutions of the 
biological fluid, i.e., serum, to be diag- 
nosed into these cups; 

- incubating the microplate; 

- washing carefully the microplate with an 
appropriate buffer; 

- adding into the cups specific labelled 
antibodies directed against blood im- 
munoglobulins; and 

- detecting the antigen-antibody complex 
formed, which is then indicative of the 
presence of said antibodies in the bio- 
logical fluid. 

13. A diagnostic kit for the in vitro detection of 
antibodies against the y/\rus of claim 1 or 2 or 
both, which contains an antigen obtained from 
said virus or consisting of a protein, 
glycoprotein or part thereof of any of claims 8 

- to 10, and the biological and chemical 
reagents, as well as equipment, necessary for 
performing diagnostic assays. 

14. An immunogenic composition containing an 
antigen of the virus of claim 1 or 2 or any 
immunogenic protein, glycoprotein or part 
thereof of any of claims 8 to 10 In association 
with a pharmaceutlcally and/or physiologically 
acceptable carrier. 

15. The Immunogenic composition of claim 14 
wherein the immunogenic glycoprotein or part 
thereof is the gpl 10 envelope glycoprotein or 

"--part thereof. - - - 

18. The immunogenic composition of claim 14 
which contains the part of the protein or 
glycoprotein of claim 10. 
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17. The antibodies, particularly monoclonal anti- 
bodies, formed against any of the proteins, 
glycoproteins or parts thereof of any of claims 
8 to 10. 

5 

18. The cells transformed with the DNA recom- 
binant of claim 5. 

19. A process for making the DNA of claim 3 or 4, 
comprising the step(s) of: io 

- Isolating the DNA of LAVeli or LAVmal; or 

- cleaving the DNA of LAVeli or LAVmal 
with a suitable restriction enzyme and 
subsequently recovering the parts there- 
of: or 15 

- isolating the DNA from cells transformed 
with the DNA of LAVeu or LAVmal in 
accordance with European patent appli- 
cation 178 978 of September 17, 1985. 

20 

20. A process for nriaking the protein, glycoprotein 
or part thereof of any of claims 8 to 10 com- 
prising the step(s) of: 

- expressing the DNA of claim 19 in a cell 
transformed therewith and then recover- 25 
ing the protein, peptide or part thereof; or 

- synthesizing the protein, peptide or part 
thereof in homogeneous solution by suc- 
cessively condensing successive amino 
acids or successive peptide fragments in 30 
an appropriate order. 

PatentansprUche 

1. LAVeu-Virus, dessen RNA der cDNS der 36 
Fig.7A-7J entsprlcht. 

2. LAVwAL-Virus, dessen RNA der cDNS der 
Flg.8A-8l entsprlcht. 

40 

3. DNS sowie cDNS der Rg.7A-7J. 
4- DNS sowie cDNS der Fig.8A-8l. 

5. Rekombinante DNS, enthaltend DNS nach An- 45 
spruch 3 Oder 4. 

6. Sonde, enthaltend eine geklonte Nukleinsaure 
nach einem der AnsprUche 3 bis 5. 

so 

7. Verfahren zur Feststellung der An- Oder Abwe- 
.- .senheit von. LAVeu Oder LAVmal Oder eines 

Provirus davon in einem Gast-Gewebe. wel- 
ches die Hybridisierung der aus dem Gewebe 
erhaltenen DNS mit einer Sonde des An- ss 
spruchs 6 und die Feststellung der An- oder 
Abwesenhelt des Virus Oder Provirus in dem 
Gewebe In AbhSngigkeit davon. ob eine Hybri- 



disierung mit der Sonde stattflndet Oder nicht, 
umfaBt. 

8. Protein oder Glycoprotein, welches durch ein 
offenes Ableseraster der DNS von Anspruch 3 
Oder 4 codiert wird, oder ein Tell eines Pro- 
teins Oder Glycoproteins, das einem Abschnitt 
entsprlcht, der sich vom Aminoacylrest 37 bis 
Aminoacylrest 130 oder vom Aminoacylrest 
211 bis Aminoacylrest 289 Oder vom Aminoa- 
cylrest! 488 bis Aminoacylrest 530 der Fig.3 
erstreckt. 

9. Protein- oder Glycoproteinteil nach Anspruch 
8, das einem Abschnitt entspricht, der sich 
vom Aminoacylrest 490 bis zum Aminoacylrest 
620 der Rg.3 erstreckt. 

10. Protein- oder Glycoproteinteil nach Anspruch 
8, dessen AminosMurensequenz sich im we- 
sentlichen aus alien oder einem Tell der fol- 
genden Sequenzen zusammensetzt: 

OMP Oder gp110-Proteine einschliefillch der 

Vorstufen: 

1 bis 530 

OMP Oder gp110 ohne Vorstufen: 
34 bis 530 

Sequenz, die TMP- oder gp41 -Protein trSgt: 
531 bis 877, insbesondere 
680 bis 700 

gut erhaltene Abschnitte von OMP: 
37 bis 130, 
211 bis 289 und 
488 bis 530 

Oder gut erhaltene Abschnitte, die sich am 
Ende des OMP*s oder am Anfang des TMP's 

finden: 
490 bis 620. 

11. Verfahren zur in vitro-Bestimmung der Anwe- 
senheit von AntikSrpern gegen LAVeu oder 
LAVmal in menschlichen KorperflGssigkeiten 
die umfaBt: Kontaktierung der KorperflQssigkeit 
mit Antigenen. die aus den Viren nach An- 
spruch 1 Oder 2 erhalten wurden, oder aus 
Proteinen, Glycoproteinen oder Teilen davon 
nach einem der AnsprUche 8 bis 10 bestehen, 
und Erfassung der immunologischen Reaktion 
zwischen Antigenen und AntikSrpern. 

12. Verfahren nach Anspruch 11. das die folgen- 
..den Schrltte umfa0t: 

-= - - Einbringen einer vorbestimmten Menge 
eines oder mehrerer der genannten Anti- 
gene in die NSpfchen einer Microtiter- 
platte; 

- Zugabe anstelgender VerdUnnungen der 
biologischen FIQssigkeit, beispielswetse 
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Serum, das in den NSpfchen untersucht 
werden soil; 

- Inkublerung der Microplatte; 

- vorsichtlges Waschen der Microplatte mit 
einem geeigneten Puffer; 

- HinzufOgung spezlfischer markierter Antl- 
korper gegen Blutimmunoglobullne in die 
Napfchen und 

- Erfassung des geblldeten Antlgen-Antl- 
korperkomplexes, welcher einen Hinweis 
auf die Anwesenheit besagter Antikdrper 
In der blologischen FlUsslgkelt 1st. 

13. Diagnostlsche Ausstattung zur in vitro-Bestim- 
mung von Antikorpern gegen die Viren nach 
Anspruch 1 und/oder 2, die ein Antigen enthalt, 
das aus besagtem Virus erhalten wurde oder 
das aus einem Protein, Glycoprotein oder Tell 
davon nach einem der AnsprUche 8 bis 10 
besteht, und die biologischen und chemlschen 
Reagentien sowie AusrOstungen, die zur 
Durchfuhrung des diagnostischen Tests erfor- 
derlich sind. 



• Isolierung der ONS aus Zellen, die mit 
der DNS von LAVeli oder LAVmal ent- 
sprechend der europaischen Patentan- 
meldung 178 978 vom 17.September 
5 1 985 transformlert wurden. 

20. Verfahren zur Herstellung des Proteins, Glyco- 
proteins Oder Teilen davon nach einem der 
AnsprUche 8 bis 10. umfassend den Schritt/die 
10 Schritte: 

- Expression der DNS nach Anspruch 19 
In einer damit transformierten Zelle und 
Gewinnung des Proteins, Peptids oder 
eines Telles davon oder 
76 - Synthese der Protelne. Peptide oder Tel- 

le davon in einer homogenen Ldsung 
durch aufeinanderfolgende Kondensatlon 
der aufeinanderfoigenden Aminosauren 
Oder Peptldfragmente in einer geeigne- 
20 ten Relhenfolge. 

Revendications 

.1. Le virus I-AVeu dont I'ARN correspond h 
25 I'ADNc des Figures 7A-7J. 



14. Immunogene Zusammensetzung, enthaltend 
ein Antigen des Virus nach Anspruch 1 oder 2, 
Oder irgendein immunogenes Protein. Glyco- 
protein Oder Tell davon nach einem der An- 
sprUche 8 bis 10 in Verbindung mit einem 
pharmazeutisch und/oder physiologisch akzep- oo 
tablem Trager. 

15. Immunogene Zusammensetzung nach An- 
spruch 14, worin das immunogene Glycopro- 
tein Oder Toil davon das gp110 Hull-Glycopro- 35 
tein Oder ein Teil davon ist. 

16. Immunogene Zusammensetzung nach An- 
spruch 14. die ein Teil des Proteins oder Gly- 
coproteins nach Anspruch 10 enthMlt. 40 

17. Antikorper, insbesondere monoklonale Antikor- 
per, die gegen irgendeines der Proteine, Gly- 
coprotelne Oder Telle davon nach einem der 
AnsprUche 8 bis 10 gebildet wurden. 45 

18. Transformierte Zellen mit rekombinanter DNS 
nach Anspruch 5. 

19. Verfahren zur Herstellung der DNS nach An- so 
spruch 3 Oder 4. umfassend den Schritt/die 

-Schritte: 

— - Isolierung der DNS von LAVeu oder 
LAVmal Oder 

- Spaltung der DNS von LAVeu oder 55 
LAVmal niit einem geeigneten Restrik- 
tionsenzym und anschliefiende Gewin- 
nung der erhaltenen Teile oder 



2. Le virus LAVmal dont TARN correspond k 
I'ADNc des Figures 8A-8I. 

3. L'ADN, tel que PADNc, des Figures 7A-7J. 

4. L'ADN. tel que I'ADNc, des Figures 8A-8I. 

5. ADN recombinant contenant TADN tel que 66- 
fini k Tune des revendications 3 et 4. 

6. Sonde contenant un acide nucl^ique clon^ tel 
que d^fini k Tune quelconque des revendica- 
tions 3^5. 

7. Proc666 pour identifier la presence ou I'absen- 
ce, dans un tissu hote, du LAVeu ou LAVmal. 
ou d'un provirus de ceux-ci, qui comprend : 

- {'hybridation de I'ADN obtenu k partir 
dudit tissu avec une sonde telle que d^fi- 
nie k la revendlcation 6 ; et 

- la detection de la presence ou de Tab- 
sence dudit virus ou dudit provirus dans 
ledit tissu selon qu'il y ait ou non hybri- 
dation avec ladlte sonde. 

8. ProtSine ou glycoprot^ine qui est codSe par un 
cadre de lecture ouvert de TADN tel que d6fini 
a Tune des revendications 3 et 4, ou partie de 
la proteine ou glycoprot^ine qui correspond au 
segment s'^tendant k partir du r^sidu aminoa- 
cyle 37 au r^sidu aminoacyle 130, ou du r^si- 
du aminoacyle 21 1 au risidu aminoacyle 289, 
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ou du r^sidu aminoacyle 488 au r^sidu ami- 
noacyle 530 de la Figure 3. 

9. Partie de la prot^ine ou glycoprot§ine selon la 
revendication 8, qui correspond au segment 
s*etendant ^ partir du r^sidu aminoacyle 490 
au rSsidu aminoacyle 620 de la Rgure 3. 

10. Partie de la prot^ine ou glycoproteine selon la 
revendication 8, dont la sequence d'acides 
amines conslste essentiellement en la totality 
ou une partie des sequences qui sulvent : 

- proteines OMP ou gpllO, Incluant les 
pr^curseurs : 

1 3(530 

- OMP ou gpl 10 sans pr^curseur : 
34-530 

- sequence portant la prot6lne TMP ou 
gp41 : 

531-877, en particulier 
680-700 

- les segments bien conserves d'OMP : 
37-130. 

211-289 et 
488-530 ; ou 

- le segment bien conserve que Ton trouve 
^ I'extr^mitd de I'OMP et au d^but de la 
TMP : 

490-620. 

11. Proc6d§ pour la detection in vitro de la presen- 
ce d'anticorps dirig^s centre LAVeli ou 
LAVmal. dans un fluide corporel humain, qui 
comprend : 

- la mise en contact dudit fluide corporel 
avec des antig6nes obtenus h partir des 
virus tels que d^finis ^ Tune des revendi- 
catlons 1 et 2, ou conslstant en une 
prot^ine. glycoproteine. ou partie de 
celles-cl. telle que d^finle a Tune quel- 
conque des revendications 8 ^ 10 ; et 

- la detection de la reaction immunologi- 
que entre lesdits antlgenes et lesdits 
anticorps. 

12. Precede selon la revendication 11. qui com- 
prend les etapes conslstant h : 

- d^poser une quantity predetermin^e de 
i'un ou plusieurs desdits antigenes dans 
les cuvettes d'une plaque de microtltra- 
tion ; 

- introduire dans ces cuvettes des dilutions 
croissantes du fluide biologique. h savoir, 

du serum, k diagnostiquer ; 

- faire incuber la microplaque ; 

- laver soigneusement la microplaque avec 
un tampon approprie ; 

- ajouter dans les cuvettes des anticorps 



marques specifiques, diriges contre les 
immunoglobulines du sang ; et 

- detector le complexe antigene - anticorps 
forme, qui denote alors la presence des- 

5 dits anticorps dans le fluide biologique. 

13. Kit de diagnostic pour la detection in vitro 
d'antlcorps diriges contre le virus tel que'defini 
a I'unej des, revendications 1 et 2 ou aux deux, 

10 qui corjitient un antigfene obtenu ^ partir dudit 

virus ou consistent en une proteine, glycopro- 
teine, ou partie de celles-cl, telle que deflnle k 
Tune queiconque des revendications 8 ^ 10, et 
les reactifs biologiques et chimiques, ainsi que 

75 requlpement, necessalres pour effectuer des 
essals diagnostiques. 

14. Composition immunogene contenant un antigfe- 
ne du virus tel que detini k Tune des revendi- 

20 cations 1 et 2, ou n'importe quelle proteine, 
glycoproteine ou partie de celles-ci, immuno- 
gene, telle que detinie k Tune queiconque des 
revendications &k 10, en association avec un 
support pharmaceutiquement et/ou physlologi- 

25 quement acceptable. 

15. Composition immunogene selon la revendica- 
tion 14,1 dans laquelle la glycoproteine ou par- 
tie de celle-ci, immunogene, est la glycoprotei- 

30 ne d'enveioppe gpl 10 ou une partie de celle- 
ci. 

16. Composition immunogene selon ta revendica- 
tion 14, qui contient la partie de la proteine ou 

36 glycoproteine telle que detinie k la revendica- 
tion 10. 

17. Anticorps, partlculierement anticorps monoclo- 
naux, fonnnes contre Tune queiconque des pro- 

40 teines. glycoproteines. ou parties de celles-ci, 
telles que definies k Tune queiconque des 
revendications 8^10. 

1& Cellules transformees par I'ADN recombinant 
45 tel que defini k ia revendication 5. 

19; Precede de preparation de TADN tel que defini 
k Tune des revendications 3 et 4, comprenant 
la (ou les) etape(s) conslstant k : 
60 - isoler TADN du LAVeti ou LAVmal * ou 

- diver I'ADN du LAVeu ou LAVmal avec 
une enzyme de restriction appropriee et. 
par la suite, recuperer les parties de 
celui-ci ; ou 

65 - isoler I'ADN k partir de cellules transfor- 

mees par I'ADN du LAVeu ou LAVmal. 
conformement k la demande de brevet 
europeen 178 978 du 17 septembre 
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1985. 

20. Proc^d^ de fabrication de la prot^ine, glyco- 
prot^ine ou partie de celles-ci, telle que d^finie 
h Tune quelconque des revendications 8^10, s 
comprenant la (ou las) ^tap6(s) consistant ^ : 
• exprimer TADN tel que d^fini k la reven- 
dication 19 dans une cellule transform^e 
par celui-ci, puis r^cup^rer la prot^ine» le 
peptide, ou une partie de ceux-ci ; ou ro 
- synth^tiser la prot^ine, le peptide, ou une 
partie de ceux-ci, en solution homogfene, 
par condensation successive des acides 
amines successifs ou des fragments 
peptidiques successifs dans un ordre ap- is 
propria. 
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LAV .ELI 

iggtctctctggttagaccagatttcagcctgggagctctctggctagctagggaacccac 
tgcttaagcctcaataaagcttgccttgagtgcttcaaIgtagtgtgtgcccgtctgttgt 

100 , 

gtgactctggtaactagagatccctcagacccctttagtcagagtggaaaatctctagca 



u 



gtpgcgcccgaacagggacctgaaagcgaaagtagaaccagaggagctctctcgacgcag 

200 • • • • 

GAC7CGGC7TGCTGAAGCGCGCACGGCAAGAGGCGAGGQGCA6C6ACTGGT6AGTACGC7 

j-^GAGL • 300 

• HetGlyAUArsAlaSerVclLeuSer 

AAAATT7T7GACTAGCGGAGGCTAGAAGGAGAGAGATGGGTGCGAGAGCGTCAG7ATTAA 

• • • • • ' • 

G lyGlyLy sLeuAspLy s7rpG luLysIl eArgLeuArsProG lyGlyLy sLysLy s7yr 
GCGGGGGAAAA77AGATAAA7GGGAAAAAA77CGG77ACGGCCAGGAGGAAAGA.AAAAA7 

400 • • 

ArgLeuLysKisIleVa 17rpAlaSerArgGluLeuGluArg7yrAlaLeuAscProGly 
A7AGAC7AAAACA7A7AG7A7GGGCAAGCAGGGAGC7AGAACGATATGCAC7TAA7CC7G 

• • • • • ' • 
LeuLeuGIu7hrSerGluGlyCysLysGlnIleIIeGlyGlnLeuGlnProAlaIleGIn 

GCC7777AGAAACA7CAGAAGGC7G7AAACAAA7AA7AGGGCAGC7ACAACCAGC7A7TC 
500 • • • . 

7hrGly7hrGluGluLeuArgSerLeu7yrA6n7hrValAla7hrLeu7yrCysValIli8 
AGACAGGAACAGAAGAAC77AGA7CA77A7A7AA7ACAG7AGCAACCC7C7A77G7G7AC 

• , • 600 

LysGlylleAspValLysAsplhrLysGluAlaLeuGluLystletGluGluGluGInAsn 
A7AAAGGAA7AGA7G7AAAAGACACCAAGGAAGC777AGAAAAGATGGAGGAAGAGCAAA 

• ••••• 
LysSerLysLysLysAlaGlnG InAlaAlaAl aAsp7hrG lyAsnAsnSerG InValSer 

ACAAAAG7AAGAAAAAGGCACAGCAAGCAGCAGC7GACACAGGAAACAACAGCCAGGTCA 

700 

GlnA5n7yrProIleVaIGlnAsnLeuGlQGlyGlnHecValHLsGlnAlaIleSerPro 
GCCAAAA77A7CC7A7AG7GCAGAACC7ACAGGGGCAAA7GGTACA7CAGGCCATATCAC 

• •«••• 
Arg7hrLeuAsnAIa7rpValLysValIleGluGluLy6AlaPheSerProGluValIle 

C7AGAAC777GAACGCA7GGG7AAAAG7AA7AGAAGAAAAGGC777CAGCCCAGAAG7AA 
800 • • • • 

ProMetPheSerAlaLeu£erGluGlyAla7hrProGlnAspLeuAsn7hrHetLeuA8n 
7ACCCA7G7777CAGCA77A7CAGAAGGAGCCACCCCACAAGA7T7AAACACCA7GC7AA 

.900 

7hrValGlyGlyHisGlnAlaAlaMetGlDMetLeuLy8GIuThrIleA8nGluGluAla 
ACACAG7GGGGGGACA7CAAGCAGCCA7GCAAATGC7AAAAGAGACCATCAATGAAGAAG 

• ••••• 
Ala61u7rpA8pArgLeufiisProVaIHisAlaGlyProIleAlaPro61yGlntIetArg 

C7GCAGAA7GGGA7AGG77ACA7CCAG7GCA7GCAGGGCC7A77GCACCAGGCCAGATGA 

1000 

GluProArgGlySerAspIleAIaGly7hr7hrSer7hrLeuGlDGluGInIleAla7rp 
GAGAACCAAGGGGAAG7GA7A7AGCAGGAAC7ACTAG7ACCC77CAGGAACAAA7AGCA7 

• ••••• 
Het7hrSerAsnProProIleProValGIyGluIIe7yrLy8Arg7rpIleIleValGly 

GGA7GACAAG7AACCCAGC7A7CGGAG7AGGAGAAA7C7A7AAAAGA7GGATAAT7GTGG 
' . 1100- . . . 

LeuA8nLy8lleValArgHet7yrSerProValSerIIeLeuA8pIIeArgGlaGlyPro 
GA77AAA7AAAA7AG7AAGAA7G7A7AGCCCTGTCAGCATTTTGGACATAAGACAGGGAC 

1200 
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LysGluProPheArgAspTyrValAspArgPheTyrLysThrLeuArgAlaGluGlnAla 
CAAAGGAACCTTTTAGAGACTATGTAGACCGGTTCTATAAAACTCTAAGAGCCGAGCAAC 

• « • • • * • 
SerGlnAspValLysAsnTrpiletThrG luThrLeuLeuValGlnAsnAlaAsnProAsp 

CTTCACACGATCTAAAAAATTCGATCACAGAAACCTTGTTCGTCCAAAATGCAAACCCAG 

1300 

CysLysThrlleLeuLysAlaLeuGlyProGlnAlaThrLeuGluGluHetMetThrAla 
ATTGCAAGACTATCTTAAAAGCATTGGGACCACAGGCTACACTAGAAGAAATGATGACAG 

• « • . • • • 
CysGInGlyValGlyGlyProSerUisLysAIaArsValLeuAlaGluAlaKetSerGln 

CATGTCAGGGAGTGGGGGGGCCCAGCCATAAAGCAAGAGTTCTGGCTGAGGCAATGAGCC 
1400 • 

AlaThrAsnSerValThrThrAlaHetMetGlnArsGlyAsnPheLysGlyProArgLya 
AAGCAACAAATTCAGTTACTACAGCAATGATGCAGAGAGGCAATTTTAAGGGCCCAACAA 

1500 

IlelleLysCysPheAsnCysG ly Ly sG luG lyHis II eAlaLysAsnCy sArgAlaPr o 
AAATTATTAAGTGTTTCAATTGTGGCAAAGAAGGGCACATAGCAAAAAATTGCAGGGCCC 

• • • • • • 
ArgLysLysGlyCysTrpArgCysGlyLysGluGlyHisGlnLeuLysAspCysThrGlu 

CTAGGAAAAAGGGCTGTTGGAGATGTGGAAAGGAAGGACACCAACTAAAAGATTGCACTG 
P^POL • • ^600 

iPhePheArgGluAsnLeuAlaPheProGlnGlyLysAlaGlyGluLeu 
ArgGlnAlaAsnPheLeuGlyArglleTrpProSerHisLysGlyArgProClyAsnPhe 
AGAGACAGGCTAATTTTTTAGGGAGAATTTGGCCTTCCCACAAGGGAAGGCCGGGGAACT 

• ••••• 
SerProLysGlnThrArgAlaAsnSerProThrSerArgGluLeuArgValTrpGlyArg 

LeuGlnSerArgProGluProThrAlaProProAlaGluSerPheGlyPheGlyCluGlu 
TTCTCCAAAGCAGACCAGAGCCAACAGCCCCACCAGCAGAGAGCTTCGGGTTTGGGGAAG 
1700 • • • • 

AspAsnProLeuSerLysThrGlyAlaGluArgGlnGlyThrValSerPheAsnPhePro 
IleThrProSerGlnLysGlnGluGlnLysAspLysGluLeuTyrProLeuThrSerLeu 
AGATAACCCCCTCTCAAAAACAGGAGCAGAAAGACAAGGAACTGTATCCTTTAACTTCCC 

•GAGf^ • • ^^^^ 

GlnlleThrLeuTrpG InAr gPr oLeu Va lAlaileLys IleG lyGlyGlnLeuLy sGlu 
Ly s Ser LeuPheGIyAsnAspProLeuSerGlnl 

tcaaatcactctttggcaacgaccccttgtcgcaaItaaaaatagggggacagctaaagga 

• ••••• 

AlaLeuLeuAspThrGlyAlaAspAspThrValLeuGluGluUetAsnLeuProGlyLys 

agctcxattagatacaggagcagatgatacagtattagaagaaatgaatttgccaggaaa 

1900 

TrpLysProLyslIet IleGlyGlylleGlyGlyPhelleLysValArgGlnXyrAspGln 

atggaaaccaaaaatgatagggggaattggaggitxtatcaaagtaagacagtatgatca 

• ••••• 

IleProIleGluIleCysGlyGlnLysAlalleGlyThrValLeuValGlyProThrPro 

aatacccatagaaatctgtggacagaaagctataggtacagtattagtaggacctacgcc 

• 2000 • • • • 

• ValAsnllelleG lyArgAsnLeuLcuTbrGlnlleGlyCysThrLeuAsnPheProlle 

tgtcaacataatcggaagaaatttgttgacccagattggctgcactttaaattttccaat 

2100 

SerProIleGluThrValProValLysLcuLysProGlyMetAspGlyProLysValLys 

tagtcctattgaaactgtaccagtaaaattaaagccaggaatggatgccccaaaagttaa 

GlnTrpProLeuThrGluGluLysIleLysAlaLeuThrGIuIleCysXhrAapMetQlu 

acaaxggccaxxgacagaagaaaaaaxaaaagcaxxaacagaaaxxxgxacagaxaxgga 

2^00 
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Ly sG luGlyLyslleSerArgl leGlyProGluAsnProTyrAsnThrProIlePheAla 
AAAGGAAGGAAAAATTTCAAGAATTGGGCCTGAAAATCCATACAATACTCCAATATTTGC 

• • • V • - • 

IleLysLysLy sAspSerThrLysIrpArgLysLeuViilAspPheArgGluLeuAsnLys 
CATAAAGAAAAAAGACAGTACCAAGTGGAGAAAATTAGTAGATTTCAGAGAACTTAATAA 
2300 

ArgThrGlnAspPhelrpGluValCInLeuGlylleProEisProAIaGIyLeuLysLys 
GAGAACTCAAGATTTCTGGGAAGTTCAATTAGGAATACCGCATCCTGCAGGGCTGAAAAA 

• • • . 2400 

LysLysSerVa IThrValLeuAspValGlyAspAlaTyrPheSerValProLeuAspGlu 
GAAAAAATCAGTAACAGTACTGCATGTGGGTGATGCATATTTTTCAGTTCCCTTAGATGA 

• • ■ • • • • 
AspPheArgLysTyrThrAIaPheThrlleSerSerlleAsoAsnGluThrProGIylle 

AGATTTTAGGAAATATACCGCCTTTACCATATCTAGTATAAACAATGAGACACCAGGGAT 

2500 

ArgTy rGlnTy rAsnVa ILeuProG InG lyTrpLysGlySerProAlallePheGlnSer 
TAGATATCAGTACAATGTGCTTCCACAGGGATGGAAAGGATCACCGGCAATATTCCAAAG 

• • • • • • 
Serlle tThrLysIleLeuGluProPheArgLysGlnAsnProGluMetVallleTyrGln 

TAGCATGACAAAAATCTTAGAGCCCTTTAGAAAACAAAATCCAGAAATGGTTATCTATCA 
2600 • • • • 

TyrllecAspAspLeuTyrValGlySerAspLeuGluIleGlyGInHisArgThrLysIle 
ATACATGGATGATTTGTATGTAGGATC7GACTTAGAAATAGGGCAGCATAGGACAAAAAT 

2700 

GluLysLeuArgGluKisLeuLcuArgTrpGlyPheThrArgProAspLysLysHisGIn 
AGAGAAATTAAGAGAACATCTATTGAGGTGGGGATTTACCAGACCAGATAAAAAACATCA 

• • • • • • 
LysGluProProPbeLeuTrpHetG lyTyrGluLeuHlsProAspLysTrpThrValGln 

GAAAGAACCCCCATTTCTTTGGATGGGTTATGAACTCCATCCTGATAAATGGACAGTACA 

2800 

Serl leLy sLeuProG luLy sG luSerTrpThrValAsnAspIleGlnAsnLeuValGIu 
GTCTATAAAACTGCCAGAAAAGGAGAGCTGGACTGTCAATGATATACAGAACTTAGTGGA 

• • • • • • 
ArgLeuAsnTrpAlaSerGlnl leTyrProGlylleLysValArgGlnLeuCysLysLeu 

GAGATTAAACTGGGCAAGCCAGATTTATCCAGGAATTAAAGTAAGACAATTATGTAAACT 
2900 • , • • 

LeuArgGlyThrLysAlaLeuThrGluVallleProLeuThrGluGluAlaGIuLeuGIu 
CCTXAGGGGAACCAAAGCACTAACAGAAGTAATACCACTAACAGAAGAAGCAGAATTAGA 

3000 

LeuAlaGluAsnArgGluIleLeuLysGluProValHisGlyValTyrTyrAspProSer 
ACTGGCAGAAAACAGGGAAATTTTAAAAGAACCAGTACATGGAGTGTATTATGACCCATC 

• ••••• 
LysAspLeuIleAlaGluIleGlnLysGlnGlyHisGlyGlnTrpThrTyrGlQlleTyr 

AAAAGACTTAATAGCAGAAATACAGAAACAAGGGCACGGCCAATGGACATACCAAATTTA • 
. • • 3100 • 

GlnGluProPheLysAsnLeuLysThrGIyLysTyrAlaArglletArgGIyAIaBisThr 

TCAAGAACCATTTAAAAATCTGAAAACAGGAAAGTATGCAAGAATGAGGGGTGCCCACAC 

• • • • • • 
AsnAspValLysGlnLeuAlaGluAlaValGlnArglleSerThrCluSerlleVallle 

TAATGATGTAAAGCAATTAGCAGAGGCAGTGCAAAGAATATCCACAGAAAGCATAGTGAT 

L ^3 200 . • 

TrpGlyArgThrProLysPbeArgLeuProileGinLyaGlttThrtrpGluThrTrpTrp 
ATGGGGAAGGACTCCTAAATTTAGACTACCCATACAAAAGGAAACATGGGAAACATGGTG 

3300^ 
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AlaGluTyrTrpGlnAlaThrTrpIleProGluTrpGluPheValAsnThrProProLeu 
GGCAGAGTATTGGCAAGCCACTTGGATTCCTGAGTGGGAATTTGTCAATACCCCTCCTTT 

ValLysLeuTrpTyrGlnLcuGluLysGluProIlelleGlyAlaGluThrPheTyrVal 
AGTAAAATTATGGTACCAGTTAGAGAAGGAACCCATAATAGGAGCAGAAACTTTCTATGT 

3A00 I 

AspGlyAlaAlaAsnArgGluThrLysLeuGlyLysAlaGljyTyrValThrAspArgGly 
AGATGGGGCAGCTAATAGAGAGACTAAATTAGGAAAAGCAGGATATGTTACTGACAGAGG 

• • • • • • 
ArgGInLys ValValProLeuThrAspThrThrAsnGlnLysThrGluLeuGlnAlalle 

AAGACAGAAAGTTGTCCCTTTGACTGACACGACAAATCAGAAGACTGAGTTACAAGCAAI 
3300 • • • • 

AsnLeuAlaLeuGlnAspSerGlyLeuGluValAsnlle ValThrAspSerGlnTyrAla 
TAATCTAGCCTTGCAGGATTCGGGA7TAGAAGTAAACATAGTAACAGATTCACAATATGC 

• • , 3600 

LeuGly IlelleGlnAlaGlnProAspLysSerGluSerG luLeuValAsnGlnllelle 
ATTAGGAATCATTCAAGCACAACCAGATAAGAGTGAATCAGAGTTAGTCAATCAAATAAT 

GluGlnLeuIleLysLysGluLysValTyrLeuAlaTrpValProAlaEisLysGlylle 
AGAGCAGTTAATAAAAAAGGAAAAGGTTTACCTGGCATGGGTACCAGCACACAAAGGAAT 

3700 

GlyGlyAsnG luGlnValAspLysLeuVal SerGlnGly IleArgLysValLeuPheLeu 
TGGAGGAAATGAACAAGTAGATAAATTAGTCAGTCAAGGAATCAGGAAAGTACTATTTTT 

• • • • • • 
AspGly IleAspLysAlaGlnG luG luHisG luLy sTyrHis AsnAsnTrpAr^Alalfet 

GGATGGAATAGATAAGGCTCAAGAAGAACATGAGAAATATCACAACAATTGGAGAGCAAT 
3800 . . • 

AlaSerAspPheAsnLeuProProValValAlaLysGluIleValAlaSerCysAspLya 
GGCTAGTGATTTTAACCTACCACCCGTGGTAGCAAAAGAAATAGTAGCTAGCTGTGATAA 

• • • • 3900 

CysGlnLeuLysGlyGluAlalle tHisGlyGlnValAspCysSerProGlylleTrpGIn 
ATGTCAGCTAAAAGGAGAAGCCATGCATGGACAAGTAGACTGTAGTCCAGGAATATGGCA 

• • • • • • • 

LeuAspCysThrHisLeuGluGlyLysVallleLeuValAlaValRisValAlaSerGly 

ATTAGATTGTACACACTTAGAAGGAAAAGXTATCCTGGXAGCAGTTCATGTAGCCAGTGG 

4000 • • 

TyrlleGluAlaGluVallleProAlaGluThrGlyGlnGluThrAlaTyrPheLeuLeu 
CTATATAGAAGCAGAAGTTATTCCAGCAGAAACAGGGCAGGAAACAGCATATTTTCTTTT 

• ••••• 
LysLeuAlaGlyArgTrpProValLysValValllisThrAspAsnG lySerAsnPbeThr 

AAAATTAGCAGGAAGATGGCCAGTAAAAGTAGTACATACAGACAATGGCAGCAATTTCAC 
4100 • • . • 

SerAlaAlaValLysAlaAlaCysTrpTrpAlaGlylleLysGlnGluPheGlyllePro 
CAGTGCTGCAGTTAAGGCCGCCTGTTGGTGGGCAGGTATCAAACAGGAATTTGGAATTCC 

4200 

TyrAsnProG InSerGlnGly Val ValGluSerHetAsnLysGluLeuLysLysIlelle 
CTACAATCCCCAAAGTCAAGCAGTAGTAGAATCTATGAATAAAGAATTAAAGAAAATTAT 

• ••«•• 
GlyGlnValArgAspGlnAlaGluHisLeuLysThrAlaValGlnHetAlaValPhelle 

AGGACAGGTAAGAGATCAAGCTGAACATCTTAAGACAGCAGTACAAATGGCAGTATTCAX 

. 4300 

"HTsXsnTh'eLy.'sArgArgA'rgG ly_UeG rylQ lyTyrSe:rATaG:lyG I'liAfg 1 1 e 1 1 e As p 
CCACAATTTTAAAAGAAGAAGGGGGATTGGGGGATACAGTGCAGGGGAAAGAATAATAGA 
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IlelleAlaThrAspIleGlnThrLysGluLeuGlnLysGlnlielleLysIleGlnAsn 
CATAATAGCAACAGACATACAAACTAAAGAATTACAAAAACAAATTATAAAAATTCAAAA- 
• 4400 • • • • 

PheArgValTyrTyrArgAspSerArgAspProlleTrpLysGlyProAlaLysLeuLeu 
TTTTCGGGTTTATTACAGAGACAGCAGAGATCCAATTTGGAAAGGACCAGCAAAGCTCCT 

4500 

TrpLysGlyGluGlyAlaValVallleGlnAspLysScrAspIleLysValValProArg 
CTGGAAAGGTGAAGGGGCAGTAGTAATACAAGACAAGAGTGACATAAAGGTAGTACCAAG 

ArgLysValLysIlelleArgAsptfyrGlyLysGlnHetAlaGlyAepAspCysValAla 

/lletGluAsDArgTrpGIcVallletlleValTrpGln 
AAGAAAAGTAAAGATTATTAGGGATTATGGAAAACAGATGGCACGTGATCATTGTGTGGC 
PQL^* • 4600 



SerArgGlnAspGluAsp 
ValAspArgHet Argil 



eLysThrTrpLysSerLeuValLysHisHisKetTyrValSer 



AAGTAGACAGGATGAGGATtTAAAACATGGAAAAGTTTAGTAAAACACCATATGTATGTTT 

• ••••• 
LysLysAlaAsnArgTrpPheTyrArgKisKisTyrGluSerProKisProLysIleSer 

CAAAGAAAGCTAACAGATGGTTTTATAGACATCACTATGAAAGCCCCCACCCAAAAATAA 

4700 . . . . , 

SerGluValHislleProLeuGlyGluAlaArgLeuVallleLysThrTyrTrpGlyLeu' 
GTTCAGAAGTACACATCCCACTAGGAGAAGCTAGACTGGTAATAAAAACATATTGGGCTC 

4800 

BisThrGlyGluArgGluTrpHisLeuGlyGlnGly ValSerlleGluTrpArgLysArg 
TGCATACAGGAGAAAGAGAATGGCATCTGGGTCAGGGAGTCTCCATAGAATGGAGGAAAA 

• ••••• 
ArgTyrSerThrGlnValAspProGlyLeuAlaAspGlnLeuIleHislletTyrTyrPhe 

GGAGATATAGCACACAAGTAGACCCTGGCCTGGCAGACCAACTAATTCATATGTATTATT 

4900 

AEpCysPheSerGluSerAlalleArgLysAlallcLeuGlyAspIleValSerProArg 
TTCATTGTTTTTCAGAATCTGCTATAAGAAAAGCCATATTAGGAGATATAGTTAGTCCTA 

• ••••• 
CysGluTyrC InAlaGlyHlsAsnLysValGlySerLeuGloTyrLeuAlaLeuThrAla 

GGTGTGAGTATCAAGCAGGACATAACAAGGTAGGATCCCTACAGTATTTGGCACTAACAG 
5000 • • • • 

- LeuIleAlaProLysGlnlleLysProProLeuProSerValArgLysLeuThrGluAsp 
CATTAATAGCAGCAAAACAGATAAAGCCACCTTTGCCTAGTGTfAGGAAGCTAAGAGAAG 

-^.R . • _ • • • ^^^0 

VletGluGliiAlaProAlaAspGliiGlyProGlnArgGluProTyrAsnGluTrpAla 
ArtTrpAsnLysProGlnGlnThrArgGlyHisArgGlySerHisThrMetAsnGlyEis 
ATAGp*TGGAACAAGCCCCAGCAGACCAGGGGCCACAGAGGGAGCCATACAATGAATGGGC 

• • 

LieuGluLeuLeuGluGluLeuLysSerGluAlaValArgHxaPheProArglleTrpLeu 
A^AGAGCTTTTAGAGGAGCTTAAGAGTGAAGCTGTTAGACATTTTCCTAGGATATGGCT 
. . . 5200 

HisSerLeuGlyGlnKisIleTyrGluThrTyrGlyAspThrTrpValGlyValGluAla 
CCATAGCTTAGGACAACATATTTATGAAACTTATGGGGATACC7GGGTAGGAGTTGAAGC 

• • • • • • 
IlelleArglleLeuGlnGlnLeuLeuPhelleEisPheArglleGlyCysGlnEisSer 

TATAATAAGAATACTGCAACAATTACTGTTTATTCATTTCAGAATTGGGTGTCAACATAG 
5300 . j-^S . 

ArglleGlyllelleArgGlnArgArgAlaArg/ snGlySerSerArgSer 

K e-tA 8-pP.ro-V.a-l A 8.p P~r o A a n L e u G 1 u 



' CXGTOlTAlJlSXATTATT C G Air/^^ GAT C C AC T AG AT C CfTA AC CT AG 

5400 
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ProTrpAsnHisProG lySerGlnProAr gThrProCy BAsnLysCysHisCysLysLys 
AGCCCTGGAACCATCCAGGAAGTCAGCCTAGGACTCCTTGTAACAAGTGTCATTGTAAAA 

• • • • • • 

CysCysTyrHisCysProValCysPheLeuAsnLysG lyLeuGlyllcSerTyrGlyArg 

AGTGTTGCTATCATTGCCCAGTTTGCTTCTTAAACAAAGGCTTACGCATCTCCTATGCCA 

5500 

LysLysArgArgGlnArgArgGlyProProGlnGlyGlyGinAlaHisGlnValProlle 

ggaagaagcggagacagcgacgaggacctcctcaaggcggtcAggctcatcaagttccta 

S^-rn • • * * • 

ProLysGlnl 

taccaaagcagpaagtagtacatgtaatgcaacctttagggataatagcaatagcagcat 

5600 • • ' • 

tagtagtagcaataatactagcaatagttgtgtggaccatagtattcatagaatatagaa 

5700 

ggataaaaaagcaaaggagaatagactgtttacttgatagaataacagaaagagcagaag 

IMetArgAlaArgGlylleGluArgAsnCysGlnAsnTrpTrpLysTrpGly 

acagtggcaatgagagcgagggggatagagagaaattgtcaaaactggtggaaatggggc 

5800 

IleMe tLeuLeuG ly IleLeuIIet ThrCy sSerAlaAlaAspAsnLeuTrpValThrVal 

atcatgctccttgggatattgatgacctgtagtgctgcagacaatctgtgggtcacagtt 

• ••••• 

TyrTy rGly Va IPr oVa ITrpLy sG luAlaThrThrThr LeuPheCy sAlaSerAspAla 

tattatggggtgcctgtatggaaggaagcaaccaccactctattttgtgcatcagatgct 

5900 • • • • 

Ly sSerTyrGluThrG luAlaHisAsnll eXr pAlaThrHisAlaCys ValProThrAsp 

aaatcatatgaaacagaggcacataatatctgggccacacatgcctgtgtacccacggac 

• • • • 6000 
Pr OAs nProGlnG lull eAlaLeuG luAsnVa IThrGluAsnPheAsnHetTrpLysAsn 

cccaacccacaagaaatagcactggaaaatgtgacagaaaactttaacatgtggaaaaat 

• • • • • • 

AsnMetValGluGlnlletHisGluAspIlelleSerLeuTrpAspGlnSerLeuLysPro 

aacatggtggaacagatgcatgaggatataatcagtttatgggatcaaagcctaaaacca 

6100 

CysValLysLeuThrProLeuCy sValThrLeuAsnCy sSerAspGluLeuArgAsnAsn 

tgtgtaaaattaaccccactctgtgtcactxtaaactgtagtgatgaattgaggaacaat 

• . • • • • • 

GlyThrlletGlyAsnAsnValThrThrGluGluLysGlylle tLysAsnCysSerPheAsn 
GGCACTATGGGGAAeAATGTCACTACAGAGGAGAAAGGAATGAAAAACTGCTCTTTCAAT 
6200 • . • • 

ValThrThrValLeuLysAspLysLysGlnGlnValTyrAlaLeuPheXyrArgLeuAsp 
GTAACCACAGTACTAAAAGATAAGAAGCAGCAAGTATATGCACTTTTTTATAGACTTGAT 

* • • • 6300 
IleVa IProIleAspAsnAspSer SerThrAsnSerThrAsnTyrArgLeuIleAsnCys 
ATAGTACCAATAGACAATGATAGTAGTACCAATAGTACCAATTATAGGTTAATAAATTGT 

• • • • • • 
AsnThrSerAlalleThrGlnAlaCysProLysValSerPheGluProIleProIleKis 
AATACCTCAGCCATTACACAGGCTTGTCCAAAGGTATCCTTTGAGCCAATTCCCATACAT 

6400 

TyrCysAlaProAlaGlyPheAlalleLeuLysCysArgAspLysLysPheAscGlylbr 
TATTGTGCCCCAGCTGGTTTTGCGATTCTAAAGTGTAGAGATAAGAAGTTCAATGGAACA 

• • • • • • 

GlyProCysThrAsnVal SerThrValG lnCy sThrHisG ly l leArgProVal ValSer 
GGCCCATGCACAAATGTCAGCACAGTACAA7GTACACATGGAATTAGGCCAGTGGTGTCA 
6500 • • • • 
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ThrGlnLeuLeuLeuAsnGIySerLeuAlaGluGluGluValllelleArgSerGluAsQ 
ACTCAACTGCTGTTGAATGGCAGTCTAGCAGAAGAAGAGGTCATAATTAGATCCGAAAAT 

• • • • 6600 

LeuThrAsnAsnAlaLysAsnllelleAlaHisLeuAsnGluSerValLysIleThrCys 
CTCACAAACAATGCTAAAAACATAATAGCACATCTTAATGAATCTGTAAAAATTACCTGT 
• 

AlaArgProTyrGlnAsnThrArgGlnArgThrProIleGlyLeuGlyGltiSerLeuTyr 
GCAAGGCCCTATCAAAATACAAGACAAAGAACACCTATAGGACTAGGGCAATCACTCTAT 

. 6700 . o 

ThrThrArgSerArgSerll el leGlyGlnAlaRisCysAsnlleSerArgAlaGlnTrp 
ACTACAAGATCAAGATCAATAATAGGACAAGCACATTGTAATATTAGTAGAGCACAATGG 

• • • • • • 
SerLysThrLeuGlnGlnVa lAlaArgLysLeuG lyThrLeuLeuAanLysThrllelle 
AGTAAAACTTTACAACAAGTAGCTAGAAAATTAGGAACCCTTCTTAACAAAACAATAATA 

6800 • • • • 

LysPheLys?roSerSerG lyGlyAapFroG luIleThrThrHisSerPheAsnCysGly 
AAGTTTAAACCATCCTCAGGAGGGGACCCAGAAATTACAACACACAGTTTTAATTGTGGA 

6900 

GlyGluPhePheTyrCysAsnThrSerGlyLeuPheAsnSerlhrTrpAsnlleSerAla 
GGGGAATTCTTCTACTGTAATACATCAGGACTGTTTAATAGTACATGGAATATTAGTGCA 

• • • « • • 
TrpAsnAsnX leThrGluSerAsnAsnSerXhrAsnThrAsnlleThrLeuGlnCysArg 
TGGAATAATATTACAGAGTCAAATAATAGCACAAACACAAACATCACACTCCAATGCAGA 

7000 

IleLysGlnllelleLysIIecValAlaGlyArgLysAlalleTyrAlaProProIleGlu 
ATAAAACAAATTATAAAGATGGTGGCAGGCAGGAAAGCAATATATGCCCCTCCTATCGAA 

• • • • • • 
ArgAsnlleLeuCy sSerSerAsnlleThrG lyLeuLeuLeuThrArgAspGlyGlylle 
AGAAACATTCTATGTTCATCAAATATTACAGGGCTACTATTGACAAGAGATGGIGGTATA 

• 7100 • • • • 
AsnAsnSerThrAsnG luThrPheArgProG lyGlyGlyAspMetArgAspAsnTrpArg 
AATAATAGTACTAACGAGACCTTTAGACCTGGAGGAGGAGATATGAGGGACAATTGGAGA 

7200 

SerGluLeuTyrLyaTyrLysValValGlnlleGluProLeuGIyValAlaProThrArg 
AGTGAATTATATAAAtATAAGGTAGTACAAATTGAACCACTAGGAGTAGCACCCACCAGG 

• • . • • • • 
AlaLysArgArgVal ValGluArgGluLysArgAlalleGlyLeuGlyAlaUetPheLeu 
GCAAAGAGAAGAGTGGTGGAAAGAGAAAAAAGAGCAATAGGATTAGGAGCTATGTTCCTT 

7300 

GlyPheLeuGlyAlaAlaGlySerThrlleCGlyAlaArgSerValTbrLeuThrValGIn 
GGGTTCTTGGGAGCAGCAGGAAGCACGATGGGCGCACGGTCAG7GACGCTGACGGTACAG 

• • • • • • 
AlaArgGlnLeui-2etSerGIy I leValGInGlnGlnAsnAsnLeuLeuArgAlal leGlu 
GCCAGACAATTAATGTCTGGTATAGTGCAACAGCAAAACAATT7GCTGAGGGCTATAGAG 

7400 • • • • 

AlaGlnGlnHisLeuLeuGlnLeuThrValTrpGlylleLysGlnLeuGlnAlaArglle 
GCGCAACAGCATCTGTXGCAACTCACGGTCTGGGGCATTAAACAGCTCCAGGCAAGAATC 

7500 

LeuAlaValGluArgTyrLeuLysAspGlnGlnLeuLeuGlylleTrpGlyCysSerGly 
'CTGGCTGTGGAAAGATACCTAAAGGATCAACAGCTCCTAGGAATTTGCGGTTGCTCTGGA 
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LysKis IleCy sThrThrAenValProTrpAsnSerSerTrpSerAsnArgSerLeuAsn . 
AAACACATTTGCACCACTAATGTGCCCTGGAACTCTAGTTG6AGTAATAGATCTCTAAAT 

. • 7600 • • 

GluIleTrpGlnAsnMetThrTrpKecGluTrpCluArgGlulleAspAsnTyrThrGly 
GAGATTTGGCAGAACATGACCTGGATGGAGTGGGAAAGAGAAATTGACAATTACACAGGC 

• ••••• 
LeuIleTyrSerLeuIleGluGluSerGlnThrGlnGlnGluLy sAsnGluLysGluLeu 
TTAATATATAGCTTAATTGACGAATCGCAGACCCAGCAAGAAkAG'AATCAAAAAGAATTC 

7700 . . I . 

LeuG luLeuAspLysTrpAlaSerLeuTrpAsnTrpPheSer IleThrClnTrpLeuTrp 
TTGGAATTGGACAAGTGGGCAAGTTTGTGGAATTGGTTTAGCATAACACAATGGCTGTGG 

7800 

TyrlleLysIlePhellellet IlelleG lyGlyLeulleGlyLeuArglleValPheAla 
TATATAAAAATATTCATAATGATAATAGGAGGCTTGATAGGTTTAAGAATACTTTTTGCT 

• • • • • • 

ValLeuSerLeuValAsnArgVa lArgG InG ly Ty r SerProLeuSerPheG InThrLeu 
GTGCTTTCTTTAGTAAATAGAGTTAGGCAGGGATACTCACCTCTGTCGTTTCAGACCCTC 

7900 

LeuProAlaProArgGlyProAspArgProGluGlyThrGluGluGluGlyGlyGluArg 
CTCCCAGCCCCGAGGGGACCCGACAGGCCCGAAGGAACAGAAGAAGAAGGTGGAGAGCGA 

• ••••• 
GlyArgAspArgSerValArgLeuLeuAsnGlyPheSerAlaLeuIleTrpAspAspLeu 
GGCAGAGACAGATCCGTGAGATTCCTGAACGGATTCTCGGCACTTATCTGGGACGACCTC 

• 8000 • • • 
ArgSerLeuCy sLeuPheSerTy rHisArgLeuArgAspLeuIleLeuIl eAlaValArg 
CGGAGCCTGTGCCTCTTCAGCTACCACCGCTTGAGAGACTTAATCTTAATTGCAGTGAGG 

• . . . 8100 

IleValGluLeuLeuG lyArgArgGlyTrpAspIleLeuLysTyrLiBuTrpAsnLeuLeu 
ATTGTAGAACTTCTGGGACGCAGGGGGTGGGACATCCTCAAATATCTGTGGAATCTCCTA 

• • • • • • 
GlnTyrTrpSerGlnGluLeuArgAsnSerAlaSerSerLeuPheAspAlalleAlalle 
CAGTATTGGAGTCAGGAACTGAGGAACAGTGCTAGTAGCTTGTTTGATGCCATAGCAATA 

8200 

AlaValAlaGluGlyThrAspArgVallleGluIlelleGlnArgAlaCysArgAlaVal 
GCAGTAGCTGAGGGGACAGATAGAGTTATAGAAATAATACAAAGAGCTTGCAGAGCTGTT 
, " .~~ . • W % • • 

LeuAsnlleProArgArglleArgGlnGlyLeuGluArgSerLeuLe J p-J^F 

J IlletGlyGly 

CTTAACATACCCAGAAGAATAAGACAGGC,C.XTAGAAAGGTC7TTACTrTAAAATGGGTGG. 

8300 • • • • 

LysTrpSerLysSerSer IleVclGlyTrpProAlalleArgGluArglleArgArgThr 
CAAATGGTCAAAAAGTAGTATAGTGGGATGGCCTGCTATAAGGGAAAGAATAAGAAGAAC 

8400 

AsnProAlaAlaAspGlyValGlyAlaValSerArgAspLeuGluLysEisGlyAlalle 
TAATCCAGCAGCAGATGGGGTAGGAGCAGTATCTCGAGACCTGGAAAAACATGGGCCAAT 

• ••••• 
ThrSerSerAsnThrAlaSerThrAsnAlaAspCysAlaTrpLeuGluAUGlnGluGlu 

CACAAGTAGCAATACAGCAAGTACTAATGCTGACTGTGCCTGGCTAGAAGCACAAGAAGA 

8500 

SerAspGluValGlyPheProValArgProGlnValProLeuArgProlSetThrTyrLys 
GAGCGACGAGGTGGGCTTTCCAGTCAGACCCCAGG7ACCTTTAAGACCAATGACTTACAA 

• U3 • 

GIax a la L e"u A^ p X e u S e r Hi is P he.Le.uL y s G luL ys G 1 y G Ih? Leu GluGlyLeuIleTrp 
AGAAGCTCTAGATCTCAGCCACTTTTTAAAAGAAAAGGGGGGkCTGGAAGGGCTAATTTG 
8600 . . . 
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SerLysLysArgGlnGluIleLeuAspLeuTrpValTyrAsnThrGlnGlyllePKcrro 
CTCCAAAAAGAGACAAGAGATCCTTGATCTTTGGGTCTACAACACACAAGGCATCTTCCC 

• • 8700 

AspTrpGlnAsnTyrThrProGlyProGlylleArgTyrProLeuThrPheGlyTrpCys 
TGATTGGCAAAACTACACACCACGGCCAGGGATCAGATATCCACTAACCTTTGGATGGTG 
•••••• 

TyrGluLeuValProValAspProGlnGluValGluGluAspThrGluGryGluThrAsn 
CTACGAGCTAGTACCAGTTGATCCACAGGAGGTAGAAGAAGACACTGAAGGAGAGACCAA 

8800 

SerLeuLeuKisProIleCysGlnHisGlyMetGluAspProGluArgGlnValLeuLya 
CAGCTTGTTACACCCTATATGCCAGCATGGAATGGAGGACCCGGAGAGACAAGTGTTAAA 

• • • • • 
TrpArgPheAsnScrArgLeuAlaPheGluKisLysAlaArgGluMetHisProGluPhe 

ATGGAGATTTAACAGCAGACTAGCATTTGAGCACAAGGCCCGACAjCATGCATCCGGAGTT 
8900 • . . . 

TyrLysAsn 

CTACAAAAACTGATGACACCGACCTTTCTACAAGGGACTTTCCGCTGGGGACTTTCCAGG 

• • • • . 9000 
GAGGCGTGGACTGGGCGGGACTCGGGAGTGGCTAACCCTCAGATGCTGCATATAAGCAGC 

TGCTTTTTGCCTGTACTGpGTCTCTCTGGTTAGACCAGATTTGAGCCTGGGAGCTCTCTC 

9100 • B*-l 

CCTAGCTAGGGAACCCACTGCTTAAGCCTCAATAAAGCTTGCCTTGAGTGCTTCAAl 
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LAV.MAL 



Igctctctcttgttacaccacgtccagcccggcacctctctcgctagcaaggaacccactg 

CTTAAGCCTCAATAAAGCTTGCCTTCAGTGCCTCA^GCAGTGTGTGCCCATCTGTTGTGT 

100 . U5*X 

GACTCTGGTAACTAGAGATCCCTCAGACCACTCTAGACGGTCTAAAAATCTCTAGCAGTp 



GCGCCCGAACAGGGACTTTAAAGTGAAAGTAACAGGCACTCGAAAGCGGAAGTTCCAGAG 

200 • • * A 

AAGTTCTCTCGACGCAGGACTCGGCTTGCTGAGGTGCACACAGCAAGACGCGAGAGCGGC 

gactggtgagtacgccaatttttgactagcggaggctagaaggagaga4itgcgtccgag 

AlaSerValLeuSerGlyGlyLysLeuAspAlalrpGluLyslleArgLeuArgProGly 

agcgtcactattaagcgggggaaaattagatgcatgggacaaaattcggttaaggccagg 

.... AGO . . . .- — - 

GlyLysLysLysTyrArgLeuLysHisLeuValTrpAlaSerArgGluLeuGluArgPhe 

gggaaagaaaaaatatagactgaaacatttagtatgggcaagcagggagctggaaagatt 

. • • • • 

AlaLeuAsnProGlyLeuLcuGluThrGlyGluGlyCysGlnGlnlleMetGluGlnLeu 

cJJJc5?a1Scc?ggcc??ttagaaacacgagaaggatgtcaacaaataaiggaacagct. 

500 . . • • 

GlnSerThrLeuLysThrGlySerGluGluIleLysSerLeuTyrAsnThrValAlaTht 

ACAATCAACTCTCAAGACAGGATCAGAAGAAATTAAATCATTATATAATACAGTAGCAAC 

LeuTyrCysValHisGlnArglleAspValLysAspThrLysGluAlaLeuAspLysIl^ 
CCTCTATTGTGTACATCAAAGGATAGATCTAAAAGACACCAAGGAAGCGCTAGATAAAAT 

GloGlttlleGlnAsnLysSerArgGlnLysThrGlnGlnAlaAlaAUAlaGlnGlnAla 
AGAGGAAATACAAAATAAGAGCAGGCAAAAGACACAGCAGGCAGCAGCTGCACAGCAGGC 

700 

AlaAlaAlaThrLysAsnSerSerSerValSerClnAsnTyrProIleValGlnAsnAla 
AGCAGCTGCCACAAAAAACAGCAGCAGTGTCAGTCAAAATTACCCCATAGTGCAAAATGC 

GlnGlyGlnMetlleHisGlnAlalleSerProArgThrLeuAsnAUTrpValLysVa^ 
ACAAGGGCAAATGATACATCAGGCCATATCACCTAGGACTTTGAATGCATGGCTGAAAGT 

800 

IleGluGluLysAUPheSerProGluVallleProKetPheSerAlaLeuSerCluGly 
AATAGAAGAAAAGGCTTTCAGCCCAGAAGTCATACCCATGTTCTCAGCATTATCAGACOG 

AlaThrProGlnAspLeuAstilletMetLeuAsnlleValGlyGlyKisGlnAlaAlaUet 
GGCCACCCCACAAGATTTAAATATGATGCTGAACATAGTTGGAGGACACCAGGCAGCTAT 

. « «*, 

GlnMetLeuLysAspThrlleAsnGluGluAlaAlaAspTrpAspArgValHxsProVal 
GCAAATGTTAAAAGATACCATCAATGAGGAAGCTGCAGACTGCGACAGGGTACATCCAGT 

1000 

HisAlaClyPrplleProProGlyGlnHetArgGluProArgGlySerAspIleAlaGly 
ACATGCAGGGCCTATTCCCCCAGGCCAGATGAGAGAACCAAGAGGAAGTGACATAGCAGG 
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ThrThr Set Ahru^uGintiXuClnlleGlyTrpMetThrSerAsnProProIleProVal 
AACTACTAGTACCCTTCAAGAACAAATAGGATGGATGACAACCAACCCACCTATCCCAGT 
1100 • • I . 

ClyAspIlcTyrLysArgTrpIlelleLeuGlyLeuAsnLysIleValArgMctTyrScr 
GGGAGACATCTATAAAAGATGGATAATCCTGGGATTAAATAAAATAGTAAGAATGTATAG 

• I • 1200 

ProValSerlleLeuAspIleArgGlnGlyProLysGluPfroPheArgAspTyrValAsp 
CCCTGTCAGCATTTTGGACATAAGACAAGGGCCAAAGGAACjCTTTTAGAGACTATGTAGA 
•••••• 

ArgPhePheLysThrLeuArgAlaGluGlnAlaThrGlnCluValLysAsnTrpMetThr 
TAGGTTCTTTAAAACTCTCAGAGCTGAGCAAGCTACACAGGAGCTAAAAAATTCGAT6AC 

1300 

GluThrLeuLeuValGlDAsnAlaAsnProAspCysLysTbrlleLeuLysAlaLeuGly 
AGAAACCTTGCTGGTCCAAAATGCGAATCCAGACTGTAAGACCATTTTAAAAGCATTAG6 

• ••••• 
ProGlyAlaThrLeuGluGluHetMetThrAlaCysGlnGlyValGlyGlyProSerBis 

ACCAGGGGCTACATTAGAAGAAATGATGACAGCATGCCAGGGAGTGGGAGGACCCAGTCA 
1400 .... 

LysAlaArgValLeuAlaGluAlaMetSerGlnAlaThrAsnSerThrAlaAlallellet 
TAAAGCAAGAGITTTGGCTGAGGCAATGAGCCAAGCAACAAATTCAACTGCTGCCATAAT 

• • • • • 1500 
MetGlnArgGlyAsnPheLysGlyGlnLysArglleLysCysPheAsnCysGlyLysGlu 

GATGCAGAGAGGTAATTTTAAGGGCCAGAAAAGAATTAAGTGTTTCAACTGTGGCAAAGA 

• ••••• 
GlyHisLeuAlaArgAsnCysArgAlaProArgLysLysGIyCyaTrpLysCyaGlyLys 

AGGACACCTAGCCAGAAATTGCAGGGCCCCTAGGAAAAAGGGCTGXTGGAAATGTGGGAA 

• • • 1600 p-^POL 

p'hePbeArgGluAanLeu 
GluGlyHisGlnMetLysAspCysThrGluArgGlnAlaAs aPheLeuGlyLys IleTrp 
GGAAGGACACCAAAIGAAAGACTGCACTGAGAGACAGGCTAAriTTTTAGGGAAAATTTC 

• ••••• 
AlaPheProGlnGlyLysAlaArgGluPhePrcSerGluGlnThrArgAlaAsnSerPro 

ProSerHisLysGlyArgProGlyAsnPheLeuGlnSerArgProGluProThrAlaEr-O* 

GCCTTCCCACAAGGGAAGGCCAGGGAATTTCCTTCAGAGCAGACCAGAGCCAACAGCCCC 
1700 

ThrSerArgGluLeuArgValTrpGlyGlyAspLysThrLeuSerGluThrGlyAlaGlu 

ProAlaGluSerPheGlyPheGlyGluGluIleLysProSerGlnLysGlnGluGlnLys 
ACCACCAGAGAGCTTCGGGTTTGGGGAGGAGATAAAACCCTCTCAGAAACAGGAGCAGAA 

• • 1800 

ArgGlnGlylleValSerPheSerPheProGlnlleThrLeuTrpGlnArgProValVal 

AspLysGluLeuTyrProLeuAlaSerLeuLysSerLeuPbeGlyAsnAspGlnLeuSer 
AGACAAGGAATTGTATCCTTTAGCTTCCCTCAAATCACTCTTTGGCAACGACCAGTTCTC 
GAQ^ ...... 

ThrValArgValGlyGlyGlnLeuLysGluAlaLeuLeuAspThrGlyAlaAspAspThr 
Gin 

ACAG rAAGAGTAGGAGGACAGCTAAAAGAAGCTCTATTAGACACAGGAGCAGATGATACA 

' • . . 1900 

ValLeuGluGluIleAsnLeuProGlyLysTrpLysProLysKetlleGlyGlylleGly 
GTATTAGAAGAAATAAATTTGCCAGGAAAATGGAAACCAAAAATGATAGGGGGAATTGGA 

• ••••• 
GlyPhellcLysValArgGlnTyrAspGlnlleLeuIleGlttlleCysGlyLysLysAla 
GGTTTTATCAAAGTAAGACAGTATGATCAAATACTTATAGAAATTTGTGGAAAAAAGGCT 

2000 .... 
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IleGlyThrlleLeuValGlyProTbrFroValAsnllelleGlyArgAsDMetLeuTbr 
ATAGGTACAATATTGGTAGGACCTACACCTGTCAACATAATTGGACGAAATATGTTGACT 

• • • • 2100 
GInlleGlyCysThrLeuAsnPheProIleSerProIleGluThrValProValLysLeu 
CAGATTGGTTGTACTTTAAATTTTCCAATTAGTCCTATTGAGACTGTACCAGTAAAATTA 

• •••«• 

LysProGlyKetAspGlyProArgValLysGlaTrpProLeuTbrGluGluLysIleLys 
AAGCCAGGGATGGAT6GCCCAAGGGTTAAACAATGCCCATTGACAGAAGAAAAAATAAAA 

2200 

AlaLeuThrGluZleCysLy sAspMetGluLysGluGlyLysZleLeuLysIleGlyPro 
GCATTAACAGAAATTTGTAAAGATATGGAAAAGGAAGGAAAAATTTTAAAAATT666CCT 

• ••••• 

GluAsnProTyrAsnThrProValPbeAlalleLysLysLysAspSerThrLyaTrpArg 
GAAAATCCATACAATACTCCAGTATTTGCCATAAAGAAAAAA6ACAGCACTAAAT6GAGA 
2300 • . . . 

Ly sLeuValAsnPheArgG luLeuAsnLy sArgThrGlnAspPheTrpGluValGlnLeu 
AAATTAGTGAATTTCAGAGAGCTTAATAAAAGAACTCAAGATTTTTGGGAAGTTCAATTA 

• • 2400 . 

GlylleProHisProAlaG lyLeuLysLysLysLysSerValThrValLeuAspValGly 
GGAATACCACATCCTGCTGGGTTGAAAAAGAAAAAATCAGTCACAGTATTGGATGTGGGC 

• ••••• 
AspAlaTyrPheSerValProLeuAspGluAspPheArgLysTyrThrAlaPheXhrlle 
GATGCATATTTTTCAGTCCCTTTAGATGAAGATTTCAGGAAGTATACTGCATTCACTATA 

2500 

ProSer II eAsnAsnG luTbrProG lylleArgTyrGlnTyrAsnValLeuProG InG ly . 
CCCAGTAT-TAATAATGAGACACCAGGGATTAGATATGAGTACAATGTGCTACCACAGGGA 

• • • • • 

TrpLy sGlySerProAlallePbeGlnSerSerKetThrLysIleLeuGluProPbeArg 
TGGAAAGGATCACCAGCAATATTCCAGAGTAGCATGACAAAAATCTTAGAACCCTTTAGA 
2600 • 

TbrLysAsnProGluIleVallleTyrGlnTyrKetAapAspLeuTyrValGlySerAap 
ACAAAAAATCCAGAAAIAGTCATATACCAATACATGGATGATTTCTATGTAGGGTCTGAT 

• • • • 2700 
LeuGluIleGlyGlnHisArglbrLysIleG luG luLeuArgGIuEisLeuLeuLysTrp 
TTAGAAATAGGACAACATAGAACAAAAATAGAGGAACTAAGAGAACATCTATTGAAATGG 

• ••••• 

G lyPbeTbrThr Pro As pLy sLy s H isG InLy sG luProProPheLeuTrpMetGlyTyr 

GGATTTACCACACCAGACAAAAAGCATCAGAAAGAACCCCCATTTCTTTGGAT6GCGTAT 

2600 • • 

GluLeuHisProAspLysTrpTbrValGlnProIleGlnLeuProAspLysGluSerTrp 

GAACTCCACCCTGACAAATGGACAGTGCAGCCTATACAACTGCCAGACAAGGAAAGCTGG 

• • • • • • 
ThrValAsnAspIleGlnLysLeuValG lyLysLeuAsnTrpAlaSerGlnlleXyrPro 
ACTGTCAATGATATACAGAAATTGGTGGGAAAACTAAATTGGGCAAGTCAGATTTATCCA 

2900 • . . . 

Gly II eLy fi V« ILy fiC InLeuCy sLyaLeuLeuArgClyAlaLysAlaLeuThrAspIA^- 
GGAATTAAAGTAAAGCAATTATGTAAACTCCTTAGGGGAGCAAAAGCACTAACAGACATA 

• • • • 3000 
ValProLeuThrAlaGluAlaGluLeuG-IuLeuAlaG luAsnArgCIuIleLeuLyaGlu 
G^-A 6 OA T^'AA-CT 6 C AG'A 6 6 C AG A AT TAG A AT TG G C A ^ 
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ProValHisGlyValTyrTyrAspProSerLysAspLcuIleAlaGluIleGlnLysCln 
CCAGTGCATGGGGTATATTATGACCCATCAAAAGACTTAATAGCAGAAATACAGAAGCAG 

3100 

GlyGInGlyGlnTrpThrTyrGloIleTyrGlDGluGloTyrLysAsnLeuLysThrGly 
GGGCAAGGTCAATGGACATATCAAATATACCAAGAGCAATi^TAAAAATCTGAAAACAGGG 

• • • • • • 
LysTy rAlaArglleLysSerAIaHisThrAsnAspValLysGlnLeuIhrGluAlaVal 
AAGTATGCAAGAATAAAGTCTGCCCACACTAATGATGTAAAACAATTAACAGAAGCACTG 

9 . 3200 e • , « » 

GlnLysIleAlaGlnGluSerlleVallleTrpGlyLysThrProLysPheArgLeuPro 
CAAAAGATAGCCCAAGAAAGCATAGTAATATGGCGAAAAACTCCTAAATTTAGACTACCC 

3300 

IleGlnLyBCluTbrTrpGluAlaTrpTrpThrCluTyrTrpGXnAlaThrTrpIlePro 
ATACAAAAAGAAACATGGGAGGCATGGIGCACAGAATATTGGCAAGCCACCTGGATCCCT 

• ••••• 
GluTrpGluPheValAsnThrProProLeuValLysLeuTrpTyrGlnLeuGluThrGlu 
GAATGGGAGTTTGTCAATACTCCTCCCCTAGTAAAACTATGGTACCAGTTAGAAACAGAA 

3400 

ProIleValGlyAlaGluThrPheTyrValAspGlyAlaAlaAsnArgGluThrLysLys 
CCCATAGTAGGAGCAGAAACTTTCTATGTAGATGGGGCAGGTAATAGAGAAACTAAAAAG 

• • • ' • • • 
GlyLy sAlaGlyTy rValThrAspArgG lyArgGlnLys Valval SerLeuThrGluThr 

—GGAAAAGCAGGATATGTTACTGACAGAGGAAGACAAAAGGTTGTCXCCTTAACTGAAACA 
« 3300 • • • • 

ThrAsTiGlnLysThrGluLeuGlnAlalleHisLeuAlaLeuGlnAspSerGlySerGlu 
ACAAATCAGAAGACTGAATTACAAGCAATCCACTTAGCTTTACAGGATTCACGATCAGAA 

3600 

ValAsnlleValThrAspSerGlnTyrAlaLeuGlyllelleGlnAlaGlnProAspLya 
GTAAACATAGTAACAGACTCACAGTATGCATTAGGGATTATTCAAGCACAACCAGATAAA 

• • . • • • • 

SerGluSerGluIleValAsnGlnllelleGluGlnLeuIleGlnLysAspLysValTyr 
AGTGAATCAGAGATTGTTAATCAAATAATAGAGCAATTAATACAGAAGGACAAGGTCXAC 

3700 

LeuSerTrpValProAlaHisLysGlylleGlyClyAsnGluGlnValAspLysLeuVal 
CTGTCATGGGTACCAGCACACAAAGGGATTGGAGGAAATGAACAAGTAGATAAATTAGTC 

• ••••• 

SerSerGlylleArgLysValLeuPheLeuAspGlylleAspLysAlaGInGluGluHia 
AGCAGTGGAATCAGAAAGGTACTATTTTTAGATCGGATAGATAAGCCTCAAGAAGAACAT 

• 3800 - ^ 

GluLy sTyrHisSerAsnTrpArgAlaKetAlaSerAspPheAsnLeuProProIleVal 
GAAAAATATCACAGCAATTGGAGAGCAATGGCTAGTGACTTTAATCTACCACCTATAGTA 

3900 

AlaLysCluIleValAlaSerCysAspLysCysGlnLeuLysGlyGluAlaMetHisGly 
GCGAAGGAAATAGIAGCCAGCTGTGATAAATGTCAACTAAAAGGGGAAGCCAIGCATGGA 
, , ^ • - • ■ • • 

GlnValAspCysSerProG ly II eTrpGlnLeuAs pCy sThrHisLeuG luG lyLy s lie 
CAAGTAGACTGTAGTCCAGGGATATGGCAATTAGATTGCACACATCTAGAAGGAAAAATA 

4000 

IlelleValAlaValHisValAlaSerGlyTyrlleGluAlaGloVallleProAlaGlu 
ATCATAGTAGCAGTCCATGTAGCCAGTGGATA7ATAGAAGCAGAAGTTATCCCAGCAGAA 

ThrGlyGlnCluThrAlaTyrPhellcLeuLyaLeuAlaGlyArgTrpProValLyaVal 
— "jreXG GAXXG GAGXC AGCA T ACTTTXTAXTAIKAATTAG CAGtSAAGATC GCCAG TA AAAG T A 

4100 . • . • 
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ValBisThrAspAsnGlySerAsnfhethrSerAlaAlaVfilLysAlcAleCycTrpTrp 
CTACACACAGACAAIGGCAGCAATTTCACCAGTCCTGCACTTAAAGCAGCCTGTTGCTCC 

AlaAsnlleLysGlnGluPheGlylleProTyrAsnProClnSerGlnGlyValValClu 
GCAAATATCAAACAGGAATTTGGAATTCCCTACAACCCCCAAA6TCAAGGAGTAGTGGAA 

• ••••• 

SerMetAsnLyeGluLeuLysLysIlelleGlyGlnValArgGluGlnAlaGluEisLeu.. 

TCTATGAATAAGGAATTAAAGAAAATCATAGGGCAGGTAAGAGAGCAAGCTGAACACCTT 

4300 

LysThrAlaValGlnMetAlaValPhclleHisAsnPhcLysArgLysGlyClylleGly 
AAGACAGCACTACAAATGGCAGTCTTCATTCACAATTTTAAAAGAAAAGGGGGGATTGC6 

GlyTyrSerAlaGlyGluArgllelleAspMetlleAlaThrAspIleGlnThrLysGlu 
GGGTACAGTCCAGGGGAAAGAATAATAGACATGATAGCAACAGACATACAAACTAAAGAA- 

, 4400 « • • • 

LeuGlnLysGlnlleThrLysIleGlnAsnPheArgValTyrTyrArgABpAsnArgAsp 
TTACAAAAACAAATTACAAAAATTCAAAATTTTCGGGTTTATTACAGGGACAACAGAGAC 

4500 

ProIleTrpLysGlyProAlaLysLeuLeuTrpLysGlyGluGlyAlaValVallleCIn 
CCAATTTGGAAAGGACCAGCAAAACTACTCTGGAAAGGTGAAGCGCCAGTAGTAATACAG 

• • • • • j^Q • 

AspAsnSerAspIleLysValValProArgArgLysAlaLysIlelleArgAspTyrGly 

- - - - , . . . „ . -. ., „ . Jie 1 6 lu 

GACAATAGTGATATAAAGGTAGTACCAAGAAGAAAAGCAAAAATCATTAGGGATIATCCA 

4600 POLgt- • 
LysGlnHetAlaGlyAspAspGysValAlaGlyGlyGlnAspGluAsp 

AsnArgTrpGliiValMetlleValTrpGlnValAspArgMetArglleArgThrTrpHis 
AAACAGATGGCAGGTGATGATTGTGTGGCAGGTGGACAGGATGAGGAirrAGAACATGGCA 

• . • • • • • 

ScrLeuValLysHisHisMctTyrValSerLysLysAlaLysAsnTrpPheTyrArgHis 
CAGTTTAGTAAAACATCATATGTATGTCTCAAAGAAAGCTAAAAATTGGTTTTATAGACA 
4700 . . . • 

HisTyrGluSerArgHisProLysValSerSerGluValflisIleProLeuGlyAspAla 
TCACTATGAAAGCAGGCATCeAAAAGTAAGTTCACAAGTACACATCCCACTAGGGGATGC 

4800 

ArgLeuValValArgThrTyrTrpGlyLeuGlnThrGlyGluLysAspTrpHisLeuGly 
TAGATTAGTAGTAAGAACATATTGGGCTCTGCAAACAGGAGAAAAAGACTGCCACTTGGG 

• • • • • • 

HisGlyValSerlleGluTrpArgGlnLysArgTyrSerTbrGlnLeuAspProAspLeu 
TCATGGGGTCTCCATAGAATGGAGGCAGAAAAGATATAGCACACAACTAGATCCTGACCT 

4900 

AlaAspGlnLeuIleKisLeuTyrTyrPheAspCysPhcSerCluSerAlallcArsGln 
AGCAGACCAACTGATTCATCTGTACTATTTTGATTGXTTTTCAGAATCTCCCAIAAGACA 

• • • • • • 

AlalleLcuGlyHislleValSerProArgCysAspTyrGlnAlaClyHisAsnLysVal 
AGCCATATTAGGACATATAGTTAGTCCTAGGTGTGATTATCAAGCAGCACATAACAAGGT 
» 5000 • « • • 

GlySerLeuGlnTyrLeuAlaLeuThrAlaLeulleAlaProLyfiLysThrArgProPro 
AGGATCTTTACAGTATTTGGCACTAACAGCATTAATAGCACCAAAAAAGACAAGGCCACC 

• r^R • • 5100 

llletGluGInAlaProAlaAspGlnGIy 

LeuP-ro SerJVaXAxgLy sLeuT.hrGlu As.pAxlgTr.pAsjiLy 8.^^ InTSurLy «G ly 

TTC C C T A"^ TG-t:^A^^ A AG e5AAGA<;AA«.A4:-^ 
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• • • Q*--f ' • • : % 

FroGlDArgGluProUisAfinG luTrpThrljeuGluLeuLeuGluGluLeuLysGlnGlu 
HisArgGlySerHisThrMetAsnGlyHid 

ccacagagggagccacacaatgaatggacaiItagaactttiIagaggagcttaagcaagaa 

• 5200 I 

AlaValArgHisPheProArglleTrpLeuEisSerLeuGIyGlnHiBlleTyrGluThr 

gctgtcagacactttcctaggatatggctccatagtttaggacaacatatctatgaaact 



TyrGlyAspThrlrpG luG ly ValGluAlallelleArgSerLeuGlnG InLeuLeuPhe 

tatggggatacctgggaaggagttgaagctataataagaagtctgcaacaactgctgttt 

5300 • • * • • 

IleEisPheArglleGlyCysGlnBisSerArglleGIylleThrArgGlnArgArgAla 

attcatttcagaattgggtgtcaacatagcagaataggcattactcgacagagaagagca 

r^s • R<Pn • • • ^^^^ 

ArgAsnG ly SerSerAr gSet 

MetAspProValAspPx oAsnLeuGluProTrpAanHisProGlySerGInProArg 

AGAAATGGATCCAGTAGATCCTAACTTAGAGCCCTGGAACCATCCAGGGAGTCAGCCTAG 

• • • • • • 

ThrProCy sAsnLyeCy sTyrCy sLysLysCysCysTyrHisCy sGlnHetCysPbelle 

GACGCCTTGTAATAAGTGTTATTG7AAAAAGTGCTGCTATCATTGCCAAATGTGCTTCAV- 

. . . . - • 5500 .„ _ . 

ThrLysGlyLeuGly II eSerTyrGlyArgLysLysArgArgGloArgArgArgProPro 

AACGAAAGGCTTAGGCATCTCCTATGGCAGGAAGAAGCGGAGACAGCGACGAAGACCTCC 



lIuGIn 
lAGCAGp 



GlnG lyAsnGlnAl^aHisG InAspFroLeuProG] 
TCAGGGCAATCAGGCTCATCAAGATCCTCTACCAGAGCAGfAAGTAGTATATGTAATACA 
5600 

ACCTTIAGTGATATTAGCAATAGTAGCATTAGTAGTAACGCTAATAATAGCAATAGTTGT 

• • • • • 5700 
GTGGACCATAGTATTTATAGAAATTAGGAAAATAAGAAGACAAAGGAAAATAGACAGGTT 

r->ENV 

jMetArgValArgGluIleGlnArg 
GATTGATAGAATAAGAGAAAGAGCAGAAGATAGTGGciATGAGAGTGAGGGAGATACAGA 

5800 

AsnTy rG InAsnTrpTrpArgTrpG lyMe tHet LeuLeuGlyKe tLeuKe tThrCy sScr 
GGAATTATCAAAACTGGTGGAGATGGGGCATGATGCTCCTTGGGATGTTGATCACCTGTA 

• ••••• 
IleAlaGluAspLeuTrpValThrValTyrTyrGlyValProValTrpLysGluAIaThr 

GTATTGCAGAAGATTTGTGGGTTACAGTTTATTATGGGGTACCTGTGTGGAAAGAAGCAA 
5900 . • . • 

ThrThrLeuPheCy sAlaSerAspAlaLysSerTyrGluThrGluValHisAsnlleTrp 
CCACTACTGTATTTTGTGCATCAGATGCTAAATCATATGAAACAGAAGTACATAACATCT 

• • • • • 6000 
AlaThrHisAlaCysValProThrAspProAsnPToGlnGIuIleGluLeuGluAsnVal 

GGGCTACACATGCCTGTGTACCCACGGACCCCAACCCACAAGAAATAGAACTGGAAAATG 

• ••••• 
ThrGluGlyPheAsnHetTrpLysAsnAsnKetValGIuGlnHetBisGluAspIlelle 

TCACAGAAGGGTTTAACATGTGGAAAAATAACATGGTGGAGCAGATGCATGAGGATATAA 

• • • 6100 • 
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SerLeuTrpAspGlnSerLeuLysFroCysValLyfiLeuThrProLeuCysValThrLett 
TCAGTTTATGCGATCAAAGCCTAAAACCATGTCTAAAGCTAACCCCACTCTGTGTCACTT 

• • • • • : • 
AsnCysThrAsnValAsoGlyThrAlaValAsnGlyThrAsnAlaClySerAsnArgThr 

TAAACTCCACTAATGTGAATGGGACTGCTGTGAATGGGACTAATGCTGGGACTAATAGCA 
6200 • • • • 

AsnAlaGluLeuLysMetGluIleGlyGluValLysAsDCysSerPheAsnlleThrPro 
CTAATGCAGAATTGAAAATGGAAATTGGAGAAGTGAAAAACTGCTCTTTCAATATAACCC 

6300 

ValGlySerAspLysArgGlnGluTyrAIaThrPheTyTAsnLeuAspLeuValGlnlle 
CAGTAGGAAGTGATAAAAGGCAAGAATATGCAACTTTTTATAACCTTGATCTAGTACAAA 

• • • • • • 
AspAspSerAspAsnSerSerTyrArgLeuIleAenCysAanThrSerVallleThrGln 

TAGATGATAGTGATAATAGTAGTTATAGGCTAATAAATTGTAATACCTCAGTAATTACAC 

6400 

AlaCysProLysValThrPheAspProIleProIleHisTyrCysAlaProAlaGlyPhe 
AGGCTTGTCCAAAGGTAACCTTTGATCCAATTCCCATACATTATTGTGCCCCAGCTGCTT 

• • • • ' • • . 
AlalleLeuLysCysAsnAspLysLyfiPheAsnGlyThrGluIleCysLysAsnValSer 

TTGCAATTCTAAAGTGTAATGATAAGAAGTTCAATGGAACGGAAATATGTAAAAATGTCA 
6500 • • • • 

ThrValGlnCysThrHisGlylleLysProValValSerThrGlnLeuLeuLeuAsnGly 
GTACAGTACAATGTACACATGGAATTAAGCCAGTGGTGTCAACTCAACTGCTGTTAAATG 

6600 

SerLeuAlaGluGluGluIleHetlLeAxgSerGluAsnLeuThrAspAsnlhrLysAsn 
GCAGTCTAGCAGAAGAAGAGATAATGATTAGATCTGAAAATCTCACAGACAATACTAAAA 
« • • • • • 

llelleValGlnLeuAsnGluThrValThrlleAsnCysThrArgProGlyAsnAsnThr 
ACATAATAGTACAGCTTAATGAAACTGTAACAATTAATTGTACAAGGCCTCGAAACAATA 

6700 

ArgArgGlylleEisPheGlyProGlyGlnAlaLeuTyrThrThrGlylleValGlyAsp 
CAAGAAGAGGGATACATTTCGGCCCAGGGCAAGCACTCTATACAACAGGGATAGTAGGAG 

• • • • • • 
IleArgArgAlaTyrCysThrlleAsnGluThrGluTrpAspLysThrLeuGlnGlnVal 

ATATAAGAAGAGCATATTGTACTATTAATGAAAC/.GAATGGGATAAAACTTTACAACAGG 

6&00 • • 

AlaValLy.sLeuGlySerLeuLeuAsnLysThrLysIlellePheAsnSerSerSerGly. 

TAGCTGTAAAACTAGGAAGCCTTCTTAACAAAACAAAAATAATTTTTAATTCATCCTCAG 

. . • . 6900 

GlyAspProGluIleThrThrHisSerPheAstiCysArgGlyGluPhePheTyrCysAsn 
GAGGGGACCCAGAAATTACAACACACAGTTTTAATTGTAGAGGGGAATTTTTCTACTGTA 

• • • • • • 
ThrSerLysLeuPheAsnSerThrTrpGlnAanAsnGlyAlaArgLeuSerAsnSerThr 

ATACATCAAAACTGTTTAATAGTACATGGCAGAATAATGGTGCAAGACTAAGTAAIAGCA 

, ^ —7000 

GluSerThrGly Ser IleThrLeuProCysArglleLysGlnllelleAsnlietTrpGln 
CAGAGTCAACTGGTAGTATCACACTCCCATGCAGAATAAAACAAATTATAAATATGTGGC 

• • • • • • 
Ly&ThrGlyLysAlaHetTyrAlaProProIleAlaGlyVallleAsnCysLeuSerAan 

AGAAAACAGGAAAAGCTATGTATGCCCCTCCCATCGCAGGAGTCATCAACTGTTTATCAA 
7 100 . . • . 

I TeT tTf GTy L~ei> IleLeuThr A'rgA'sprG ly G 1 y A sn SerS erAspAs nSerAs p As nG lu 

ATATTAfcAGGGdtGiAtATTAACAAGAGA^^^^ 

7200 
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ThrLeuArgProGlyGlyGlyAspMecArgAepAsnTrpIleSerGluLeuTyrLysTyr 
AGACCTTAAGACCTGGAGGAGGAGATATGAGGGACAATTGGATAAGTGAATTATATAAAT 

• • • m m m 

LysValValArglleGIuProLeuGlyValAlaProThrLysAlaLysArgArgValVal 
ATAAAGTAGTAAGAATTGAACCCCTAGGAGTAGCACCCACCAAGGCAAAGAGAAGAGTGG 

7300 

GluArgGluLysArgAlalleGlyLeuGlyAlaMetPheLeuGlyPheL'euGlyAlaAla 
TGGAAAGAGAAAAAAGAGCAATAGGACTAGGAGCCATGTTCCTTGGGTTCTXGGGAGCAG 

• • . • • • • 
GlySerThrMetGlyAlaAlaSerLeuThrLeuThrValGlnAlaArgGlDLeuLeuSer 

CAGGAAGCACGATGGGCCCAGCCTCACTAACGCTGACGGTACAGGCCAGACAGTTACTGT 
7400 . . . . 

GlylleValGlnG IdG InAsnAsnLeuLeuArgAlalleGluAlaGlnGlnBisLeuLeu 
CTGGTATAGTGCAACAGCAAAACAATTTGCTGAGGGCTATAGAGGCGCAACAGCATCTGT 

• • • • • 7500 
GlnLeuThrValTrpGlylleLysGlnLeuGlnAlaArgValLeuAlaValGluArgTyr 

TGCAACTCACGGTCTGGGGCATTAAACAGCTCCAGGCAAGAGTCC7GGCTGTGGAAAGAT 

• • • • • • 
LeuG InAspGlnArgLeuLcuG lyMetTrpG lyCysSerGlyLysEisIleCy sThrThr 

ACCTACAGGATCAACGGCTCCTAGGAATGTGGGGTTCCTCTGGAAAACACATTTGCACCA 

• • • 7600 • • 
PbeValProTrpAsnSerSerTrpSerAsnArgSerLeuAspAspIleTrpAanAsnMet 

CATTTGTGCCTTGGAACTCTAGTTGGAGTAATAGATCTCTAGATGACATTTGGAATAATA 

• • • • • • 
TbrTrplIetGlnTrpGluLysGluIleSerAsnTyrlhrGlyllelleTyrAsnLeuIle 

TGACCTGGATGCAGTGGGAAAAAGAAATTAGCAATTACACAGGCATAATATACAACTTAA 
7700 . . . 

GluGluSerGlnlleGlnG InGluLysAsnG luLysGluLeuLeuGluLeuAspLysTrp 
TTGAAGAATCGCAAATCCAGCAAGAAAAGAATGAAAAGGAATTATTGGAATTGGACAAGX 

7800 

AlaSerLeuTrpAsnTrpPbeSer IleSerLysTrpLeuTrpTyrlleArgllePheZle 
GGGCAAGTTTGTGGAATTGGTTTAGCATATCAAAATGGCTGTGGTATATAAGAATATTCA 

• • • • • • 
IleValValGlyGlyLeuIleGlyLeuArgllellePbeAlaValLeuSerLeuValAan 

TAATAGTAGTAGGAGGCTTAATAGGTTTAAGAATAATTTTTGCTGTGCTTTCTTTACTAA 

• 7900 • • 

ArgValArgGlnGlyTyrSerProLeuSerLcuGlnThrLeuLcuProTbrProArgGly 
ATAGAGTTAGGCAGGGA-TACTCACOTCTGTCGTTGCAGACCCTCCTCCCAACACCGAGGG 

• • • • • • 
ProProAspArgProGluGlylleGluGluGluGlyGlyGluGlnGlyArgGlyArgSer 

GACCACCCGACAGGCCCGAAGGAATAGAAGAAGAAGGTGGAGAGCAAGGCAGAGGCAGAT 
SOOO .... 

IleArgLeuValAsnG lyPbeSerAlaLeuIleTrpAspAspLeuArgAsnLeuCysLeu 
CAATTCGATTGGTGAACGGATTCTCAGCACXTATCTGGGAGGACC7GAGGAACCTGTGCC 

• • 0100 

PheSerTyrHisArgLeuArgAspLeuLeuLeuIleAlaThrArglleValGluLeuLeu 
TCTTCAGTTACCACCGCTTGAGAGAC7TACTCTTAATTGCAACGAGGATTGTGGAACTTC 

• • • « • • 

ClyArgArgG lyTrpGluAlaLeuLysTyrLeuTrpAsnLcuLeuGlnTyrTrpGlyC-ln. 

TGGGACGCAGGGCGTGGGAAGCCCTCAAATATCTGTGGAATCTCCTGCAATATTGGGCTC 

8200 
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GluLeuLvsAsnSerAlalleSerLeuLeuAsnThrThrAlalleAlaValAlaGluCy* 
AGGAACTCAAGAATAGTGCTATTA6CTTGCTTAATACCACAGCAATACCACTACCTCAAT 

TbrAapArgVallleGluIleClyGlnArgPheClyArgAlalleLeuBislleProArg 
CCACACATAGGCTTATAGAAATAGGACAAAGATTTCGTAGAGCTATTCTCCACATACCTA 

. 8300 . ,• >P • • . • 

EW «-| WefGlyGlyLysTrpSerLya 

ArElleArgGlnG lyPheCluArgAlaLeuLeuj I i 
GAAGAATTAGACAGGGCTTCGAAAGGGCTTTGCTA|rAAC|ATGGCTGGCAACTG6TCAAAA 

, . • • • 8400 

SerSerlleValGlyTrpProLyalleArgCluArglleArgArgThrProProThrGlu 
AGTAGCATAGTAGGATGGCCTAAGATTAGCGAAAGAATAAGACGAACTCCCCCAACAGAA 

ThrGlyValClyAlaValSerGlnAspAlaValSerGlnAspLeuABplysCysGlyAla 
ACAGGACTAGGAGCAGTATCTCAAGATGCAGTATCTCAAGATTTAGATAAATGTGGAGCA 

8500 

AlaAlaSerSerSerProAlaAlaAsnAsnAlaSerCysGluProProGluGluGluGlu 
GCCCCAAGCAGCAGTCCAGCAGCTAATAATGCTAGTTGTGAACCACCAGAACAAGAGGAG 

GluValGlyPheProValArgProGlnValProLeuArgProMetThrTyrlysGlyAla 
GAGGTAGGCTITCCAGTCCGTCCTCAGGTACCTTTAAGACCAATGACTTATAAAGGAGCT 

8600 . ^3 • 

P'lr/kspLeuSerHisPheLeuLysGluLysGlyGlWLeuAspGlyLeuValTrpSerPro 

TTTGATCTCAGCCACTTTTTAAAAGAAAAGCGGGdACTGGATGGGTTAGTTTGGTCCCCA 

. . . . • o'"" 

LvsAreGlnClulleLeuAspLeuTrpValTyrHisThrGlnGlyTyrPheProAspTrp 
AAAAGACAAGAAATCCTTGATCTGTGGGTCTACCACACACAAGGCIACITCCCTGATTGC 

GlnAsnTyrThrProGlyProGlylleArgPheProleuThrPheGlyTrpCysPheLys 
CAGAATTACACACCAGGGCCAGGGATTAGATTCCCACTGACCTTCGGATGGTGCTTTAAG 

8800 . • 

LeuValProKetSerProGluGluValGluGluAlaAsnGluGlyGluAsnAsnCysLeu 
TTAGTACCAATGAGTCCAGAGGAAGTAGAGGAGGCCAATGAAGGAGAGAACAACTGTCTG 



. • • • 

LeuKisProIleSerGlnEisGlylletGluAspAlaGluArgGluValLeuLysTrpLys 

ttacaccctattAgccaxcatggaatggaggacgcagaaagagaagtgctaaaatggaag 

8900 ...» 
PheAspSexSerLeuAlaLeuArgHisArgAlaArsGluGlnEisProGluTyrTyrLya 
TTTGACAGCAGCCTAGCACTAAGACACAGAGCCAGAGAACAACATCCGGACTACTACAAA 
P . . . • • 9000 

GACTGclrGACACAGAAGTTGCTGACAGGGGACTTTCCGCTGGGGACTTTCCACGGGAGGC 
GTAACTTGGGCGGGACCGGGGACTGGCTAACCCTCAGATGCTGCATATAAGCAGCTCCTT 

ttcgcctgtactgIggtctctcttgttagaccaggtcgagcccgggacctctctggctacc 
aaggaacccactgcttaagcctcaataaagcttgccttgagtgcctcaa 

9200 



40 



